Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread xinwu
Thank you Ani.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084485.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread xinwu
Hey Shawn .Thanks for your reply.
I just want to access the base_url easily by a short instanceDir name.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084480.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing hangs when more than 1 server in a cluster

2013-08-13 Thread Kevin Osborn
Interesting, that did work. Do you or anyone else have any ideas or what I
should look at? While soft commit is not a requirement in my project, my
understanding is that it should help performance. On the same index, I will
be doing both a large number of queries as well as updates.

If I have to disable autoCommit, should I increase the chunk size?

Of course, I will have to run a more large scale test tomorrow, but I saw
this problem fairly consistently in my smaller test.

In a previous experiment, I applied the SOLR-4816 patch that someone
indicated might help. I also reduced the CSV upload chunk size to 500. It
seemed like things got a little better, but still eventually hung.

I also see SOLR-5081, but I don't know if that is my issue or not. At least
in my test, the index writes are not parallel as in the ticket.

-Kevin


On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> While I don't have a past history of this issue to use as reference, if I
> were in your shoes I would consider trying your updates with softCommit
> disabled.  My suspicion is you're experiencing some issue with the
> transaction logging and how it's managed when your hard commit occurs.
>
> If you can give that a try and let us know how that fares we might have
> some further input to share.
>
>
> On Aug 13, 2013, at 11:54 AM, Kevin Osborn  wrote:
>
> > I am using Solr Cloud 4.4. It is pretty much a base configuration. We
> have
> > 2 servers and 3 collections. Collection1 is 1 shard and the Collection2
> and
> > Collection3 both have 2 shards. Both servers are identical.
> >
> > So, here is my process, I do a lot of queries on Collection1 and
> > Collection2. I then do a bunch of inserts into Collection3. I am doing
> CSV
> > uploads. I am also doing custom shard routing. All the products in a
> single
> > upload will have the same shard key. All Solr interaction is through
> SolrJ
> > with full Zookeeper awareness. My uploads are also using soft commits.
> >
> > I tried this on a record set of 936 products. Everything worked fine. I
> > then sent over a record set of 300k products. The upload into Collection3
> > is chunked. I tried both 1000 and 200,000 with similar results. The first
> > upload to Solr would just hang. There would simply be no response from
> > Solr. A few of the products from this request would make it into the
> index,
> > but not many.
> >
> > In this state, queries continued to work, but deletes did not.
> >
> > My only solution was to kill each Solr process.
> >
> > As an experiment, I did the large catalog first. First, I reset
> everything.
> > With A chunk size of 1000, about 110,000 out of 300,000 records made it
> > into Solr before the process hung. Again, queries worked, but deletes did
> > not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
> > this is at about the second autocommit cycle, given the default
> autocommit
> > of 15 seconds. I am not sure if this is related or not.
> >
> > As an additional experiment, I ran the entire test with just a single
> node
> > in the cluster. This time, everything ran fine.
> >
> > Does anyone have any ideas? Everything is pretty default. These servers
> are
> > Azure VMs, although I have seen similar behavior running two Solr
> instances
> > on a single internal server as well.
> >
> > I had also noticed similar behavior before with Solr 4.3. It definitely
> has
> > something do with the clustering, but I am not sure what. And I don't see
> > any error message (or really anything else) in the Solr logs.
> >
> > Thanks.
> >
> > --
> > *KEVIN OSBORN*
> > LEAD SOFTWARE ENGINEER
> > CNET Content Solutions
> > OFFICE 949.399.8714
> > CELL 949.310.4677  SKYPE osbornk
> > 5 Park Plaza, Suite 600, Irvine, CA 92614
> > [image: CNET Content Solutions]
>
>


-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]


Re: Handling categories( level one and two) based navigation

2013-08-13 Thread tamanjit.bin...@yahoo.co.in
This may be helpful, especially the last bit:

http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-categories-level-one-and-two-based-navigation-tp4083259p4084477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing hangs when more than 1 server in a cluster

2013-08-13 Thread Jason Hellman
While I don't have a past history of this issue to use as reference, if I were 
in your shoes I would consider trying your updates with softCommit disabled.  
My suspicion is you're experiencing some issue with the transaction logging and 
how it's managed when your hard commit occurs.

If you can give that a try and let us know how that fares we might have some 
further input to share.


On Aug 13, 2013, at 11:54 AM, Kevin Osborn  wrote:

> I am using Solr Cloud 4.4. It is pretty much a base configuration. We have
> 2 servers and 3 collections. Collection1 is 1 shard and the Collection2 and
> Collection3 both have 2 shards. Both servers are identical.
> 
> So, here is my process, I do a lot of queries on Collection1 and
> Collection2. I then do a bunch of inserts into Collection3. I am doing CSV
> uploads. I am also doing custom shard routing. All the products in a single
> upload will have the same shard key. All Solr interaction is through SolrJ
> with full Zookeeper awareness. My uploads are also using soft commits.
> 
> I tried this on a record set of 936 products. Everything worked fine. I
> then sent over a record set of 300k products. The upload into Collection3
> is chunked. I tried both 1000 and 200,000 with similar results. The first
> upload to Solr would just hang. There would simply be no response from
> Solr. A few of the products from this request would make it into the index,
> but not many.
> 
> In this state, queries continued to work, but deletes did not.
> 
> My only solution was to kill each Solr process.
> 
> As an experiment, I did the large catalog first. First, I reset everything.
> With A chunk size of 1000, about 110,000 out of 300,000 records made it
> into Solr before the process hung. Again, queries worked, but deletes did
> not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
> this is at about the second autocommit cycle, given the default autocommit
> of 15 seconds. I am not sure if this is related or not.
> 
> As an additional experiment, I ran the entire test with just a single node
> in the cluster. This time, everything ran fine.
> 
> Does anyone have any ideas? Everything is pretty default. These servers are
> Azure VMs, although I have seen similar behavior running two Solr instances
> on a single internal server as well.
> 
> I had also noticed similar behavior before with Solr 4.3. It definitely has
> something do with the clustering, but I am not sure what. And I don't see
> any error message (or really anything else) in the Solr logs.
> 
> Thanks.
> 
> -- 
> *KEVIN OSBORN*
> LEAD SOFTWARE ENGINEER
> CNET Content Solutions
> OFFICE 949.399.8714
> CELL 949.310.4677  SKYPE osbornk
> 5 Park Plaza, Suite 600, Irvine, CA 92614
> [image: CNET Content Solutions]



Re: SOLR4 Spatial sorting and query string

2013-08-13 Thread David Smiley (@MITRE.org)
Hi Roy.

Using the example schema and data, and copying the "store" field to
"store_rpt" indexed with location_rpt field type, try this query:

http://localhost:8983/solr/select?indent=true&fl=name,store&q=*:*&sort=query%28{!geofilt%20score=distance%20filter=false%20sfield=store_rpt%20pt=45.15,-93.85%20d=0%20v=%27%27}%29%20asc

sort spec without url encoding:  query({!geofilt score=distance filter=false
sfield=store_rpt d=0 v=''}) asc

One of the tricks there is that it's sorting on the query() function query,
which references a query that has its score used as the result of query(). 
And you put a spatial query in there (note: score=distance only works with
rpt), and voila.  The fact that v='' is needed appears to be a Solr bug.  I
realize this is super awkward, so this approach isn't well documented if at
all.  In Solr 4.5 you can simply use geodist().

Note: if you have only one point per document, I recommend sorting by
LatLonType.

~ David


roySolr wrote
> Hello,
> 
> I use the following distance sorting of SOLR
> 4(solr.SpatialRecursivePrefixTreeFieldType):
> 
> fl=*,score&sort=score asc&q={!geofilt score=distance filter=false
> sfield=coords pt=54.729696,-98.525391 d=10}  
> 
> (from the tutorial on
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
> 
> Now i want to query on a searchstring and still want to sort on distance.
> How can i combine this in above solr request? When i add something to the
> "q=" it doesn't work. I tried _query_ subquery and other stuff but i don't
> get it working
> 
> I appreciate any help,
> Thanks





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-Spatial-sorting-and-query-string-tp4084318p4084453.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding solrcloud/multi-server loadbalancing

2013-08-13 Thread Shawn Heisey

On 8/13/2013 4:47 PM, Torsten Albrecht wrote:

I have a solr 3.6 infrastructure with 4 server 24 cores/128GB (~15 shards at 
every server), 70 million documents.

Now I set up a new solr 4 infrastructure with the same hardware. I reduce the 
shards and have only 6 shards.

But I don't understand the difference between solrcloud and a multi-server 
loadbalancing. And if solrcloud the better way (more performance)?


LoadBalancer -> solrcloud (4 Nodes)

LoadBalancer -> 4 solr server with the same shards


With SolrCloud, Solr automates a LOT of things and takes care of 
redundancy for you.  You can index to any core/shard in the entire cloud 
and Solr takes care of routing the updates to the correct shard and all 
of its replicas.   You can also send queries to any core/shard in the 
cloud and they will be automatically balanced across the cloud.  If part 
of your cloud goes down and you've designed it right, everything keeps 
working, and the down machine will automatically be synchronized with 
the cloud when it comes back up.


With traditional sharding, redundancy requires designating masters and 
slaves and setting up replication.  You can only index to masters, and 
you have to figure out which shard to index to.


If all your client code is Java, you don't need a load balancer - the 
CloudSolrServer object talks to zookeeper and figures out what nodes are 
available in realtime.  You can continue to use a load balancer if you wish.


Thanks,
Shawn



Understanding solrcloud/multi-server loadbalancing

2013-08-13 Thread Torsten Albrecht
I have a solr 3.6 infrastructure with 4 server 24 cores/128GB (~15 shards at 
every server), 70 million documents.

Now I set up a new solr 4 infrastructure with the same hardware. I reduce the 
shards and have only 6 shards.

But I don't understand the difference between solrcloud and a multi-server 
loadbalancing. And if solrcloud the better way (more performance)?


LoadBalancer -> solrcloud (4 Nodes)

LoadBalancer -> 4 solr server with the same shards


Is there a benefit of the solrcloud in this case?


Regards,

Torsten


Re: Measuring SOLR performance

2013-08-13 Thread Roman Chyla
Hi Dmitry, oh yes, late night fixes... :) The latest commit should make it
work for you.
Thanks!

roman


On Tue, Aug 13, 2013 at 3:37 AM, Dmitry Kan  wrote:

> Hi Roman,
>
> Something bad happened in fresh checkout:
>
> python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
> cms -t /solr/statements -e statements -U 100
>
> Traceback (most recent call last):
>   File "solrjmeter.py", line 1392, in 
> main(sys.argv)
>   File "solrjmeter.py", line 1347, in main
> save_into_file('before-test.json', simplejson.dumps(before_test))
>   File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 286,
> in dumps
> return _default_encoder.encode(obj)
>   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 226,
> in encode
> chunks = self.iterencode(o, _one_shot=True)
>   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 296,
> in iterencode
> return _iterencode(o, 0)
>   File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 202,
> in default
> raise TypeError(repr(o) + " is not JSON serializable")
> TypeError: <__main__.ForgivingValue object at 0x7fc6d4040fd0> is not JSON
> serializable
>
>
> Regards,
>
> D.
>
>
> On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla 
> wrote:
>
> > Hi Dmitry,
> >
> >
> >
> > On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan 
> wrote:
> >
> > > Hi Roman,
> > >
> > > Good point. I managed to run the command with -C and double quotes:
> > >
> > > python solrjmeter.py -a -C "g1,cms" -c hour -x ./jmx/SolrQueryTest.jmx
> > >
> > > As a result got several files (html, css, js, csv) in the running
> > directory
> > > (any way to specify where the output should be stored in this case?)
> > >
> >
> > i know it is confusing, i plan to change it - but later, now it is too
> busy
> > here...
> >
> >
> > >
> > > When I look onto the comparison dashboard, I see this:
> > >
> > > http://pbrd.co/17IRI0b
> > >
> >
> > two things: the tests probably took more than one hour to finish, so they
> > are not aligned - try generating the comparison with '-c  14400'  (ie.
> > 4x3600 secs)
> >
> > the other thing: if you have only two datapoints, the dygraph will not
> show
> > anything - there must be more datapoints/measurements
> >
> >
> >
> > >
> > > One more thing: all the previous tests were run with softCommit
> disabled.
> > > After enabling it, the tests started to fail:
> > >
> > > $ python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> > > ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60
> > -R
> > > g1 -t /solr/statements -e statements -U 100
> > > $ cd g1
> > > Reading results of the previous test
> > > $ cd 2013.08.12.16.32.48
> > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
> > > $ mkdir 2013.08.12.16.33.02
> > > $ cd 2013.08.12.16.33.02
> > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
> > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
> > > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
> > > Traceback (most recent call last):
> > >   File "solrjmeter.py", line 1427, in 
> > > main(sys.argv)
> > >   File "solrjmeter.py", line 1381, in main
> > > before_test = harvest_details_about_montysolr(options)
> > >   File "solrjmeter.py", line 562, in harvest_details_about_montysolr
> > > indexLstModified =
> cores_data['status'][cn]['index']['lastModified'],
> > > KeyError: 'lastModified'
> > >
> >
> > Thanks for letting me know, that info is probably not available in this
> > situation - i've cooked st quick to fix it, please try the latest commit
> > (hope it doesn't do more harm, i should get some sleep ..;))
> >
> > roman
> >
> >
> > >
> > > In case it matters:  Python 2.7.3, ubuntu, solr 4.3.1.
> > >
> > > Thanks,
> > >
> > > Dmitry
> > >
> > >
> > > On Thu, Aug 8, 2013 at 2:22 AM, Roman Chyla 
> > wrote:
> > >
> > > > Hi Dmitry,
> > > > The command seems good. Are you sure your shell is not doing
> something
> > > > funny with the params? You could try:
> > > >
> > > > python solrjmeter.py -C "g1,foo" -c hour -x ./jmx/SolrQueryTest.jmx
> -a
> > > >
> > > > where g1 and foo are results of the individual runs, ie. something
> that
> > > was
> > > > started and saved with '-R g1' and '-R foo' respectively
> > > >
> > > > so, for example, i have these comparisons inside
> > > > '/var/lib/montysolr/different-java-settings/solrjmeter', so I am
> > > generating
> > > > the comparison by:
> > > >
> > > > export
> > > > SOLRJMETER_HOME=/var/lib/montysolr/different-java-settings/solrjmeter
> > > > python solrjmeter.py -C "g1,foo" -c hour -x ./jmx/SolrQueryTest.jmx
> -a
> > > >
> > > >
> > > > roman
> > > >
> > > >
> > > > On Wed, Aug 7, 2013 at 10:03 AM, Dmitry Kan 
> > > wrote:
> > > >
> > > > > Hi Roman,
> > > > >
> > > > > One more question. I tried to compare different runs (g1 vs cms)
> > using
> > > > the
> > > > > command below, but get an error. Should I attach some other
> pa

Re: Percolate feature?

2013-08-13 Thread Mark
Any ideas?

On Aug 10, 2013, at 6:28 PM, Mark  wrote:

> Our schema is pretty basic.. nothing fancy going on here
> 
> 
>  
>
>
> protected="protected.txt"/>
> generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
> preserveOriginal="1"/>
>
>
>  
>   
>
>
> ignoreCase="true" expand="true"/>
> protected="protected.txt"/>
> generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" 
> preserveOriginal="1"/>
>
>
>  
>
> 
> 
> On Aug 10, 2013, at 3:40 PM, "Jack Krupansky"  wrote:
> 
>> Now we're getting somewhere!
>> 
>> To (over-simplify), you simply want to know if a given "listing" would match 
>> a high-value pattern, either in a "clean" manner (obvious keywords) or in an 
>> "unclean" manner (e.g., fuzzy keyword matching, stemming, n-grams.)
>> 
>> To a large this also depends on how rich and powerful your end-user query 
>> support is. So, if the user searches for "sony", "samsung", or "apple", will 
>> it match some oddball listing that fuzzily matches those terms.
>> 
>> So... tell us, how rich your query interface is. I mean, do you support 
>> wildcard, fuzzy query, ngrams (e.g., can they type "son" or "sam" or "app", 
>> or... will "sony" match "sonblah-blah")?
>> 
>> Reverse-search may in fact be what you need in this case since you literally 
>> do mean "if I index this document, will it match any of these queries" (but 
>> doesn't score a hit on your direct check for whether it is a clean keyword 
>> match.)
>> 
>> In your previous examples you only gave clean product titles, not examples 
>> of circumventions of simple keyword matches.
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Mark
>> Sent: Saturday, August 10, 2013 6:24 PM
>> To: solr-user@lucene.apache.org
>> Cc: Chris Hostetter
>> Subject: Re: Percolate feature?
>> 
>>> So to reiteratve your examples from before, but change the "labels" a
>>> bit and add some more converse examples (and ignore the "highlighting"
>>> aspect for a moment...
>>> 
>>> doc1 = "Sony"
>>> doc2 = "Samsung Galaxy"
>>> doc3 = "Sony Playstation"
>>> 
>>> queryA = "Sony Experia"   ... matches only doc1
>>> queryB = "Sony Playstation 3" ... matches doc3 and doc1
>>> queryC = "Samsung 52inch LC"  ... doesn't match anything
>>> queryD = "Samsung Galaxy S4"  ... matches doc2
>>> queryE = "Galaxy Samsung S4"  ... matches doc2
>>> 
>>> 
>>> ...do i still have that correct?
>> 
>> Yes
>> 
>>> 2) if you *do* care about using non-trivial analysis, then you can't use
>>> the simple "termfreq()" function, which deals with raw terms -- in stead
>>> you have to use the "query()" function to ensure that the input is parsed
>>> appropriately -- but then you have to wrap that function in something that
>>> will normalize the scores - so in place of termfreq('words','Galaxy')
>>> you'd want something like...
>> 
>> 
>> Yes we will be using non-trivial analysis. Now heres another twist… what if 
>> we don't care about scoring?
>> 
>> 
>> Let's talk about the real use case. We are marketplace that sells products 
>> that users have listed. For certain popular, high risk or restricted 
>> keywords we charge the seller an extra fee/ban the listing. We now have 
>> sellers purposely misspelling their listings to circumvent this fee. They 
>> will start adding suffixes to their product listings such as "Sonies" 
>> knowing that it gets indexed down to "Sony" and thus matching a users query 
>> for Sony. Or they will munge together numbers and products… "2013Sony". Same 
>> thing goes for adding crazy non-ascii characters to the front of the keyword 
>> "Î’Sony". This is obviously a problem because we aren't charging for these 
>> keywords and more importantly it makes our search results look like shit.
>> 
>> We would like to:
>> 
>> 1) Detect when a certain keyword is in a product title at listing time so we 
>> may charge the seller. This was my idea of a "reverse search" although 
>> sounds like I may have caused to much confusion with that term.
>> 2) Attempt to autocorrect these titles hence the need for highlighting so we 
>> can try and replace the terms… this of course done outside of Solr via an 
>> external service.
>> 
>> Since we do some stemming (KStemmer) and filtering 
>> (WordDelimiterFilterFactory) this makes conventional approaches such as 
>> regex quite troublesome. Regex is also quite slow and scales horribly and 
>> always needs to be in lockstep with schema changes.
>> 
>> Now knowing this, is there a good way to approach this?
>> 
>> Thanks
>> 
>> 
>> On Aug 9, 2013, at 11:56 AM, Chris Hostetter  
>> wrote:
>> 
>>> 
>>> : I'll look into this. Thanks for the concrete example as I don't even
>>> : know which classes to start to look at to implement such a feature.
>>> 
>>> Either roman isn't understanding what you are aksing for, or i'm not -- but 
>>> i don't think what roman des

Re: Handling categories( level one and two) based navigation

2013-08-13 Thread payalsharma
Hi Eric,

Yeah a  document can belong to multiple subcategory hierarchies. Also we
will be having multi-level categorization unlike the 2 level I previously
mentioned. 

like : Electronics > Phones > Google Nexus ... 

Also since solr does not support relational join, so shall I fetch the
categories and subcategories from DB directly and then use Facet.pivot
feature to do category navigation and searching of documents using solr4.4.0
version ?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Handling-categories-level-one-and-two-based-navigation-tp4083259p4084387.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing hangs when more than 1 server in a cluster

2013-08-13 Thread Kevin Osborn
I am using Solr Cloud 4.4. It is pretty much a base configuration. We have
2 servers and 3 collections. Collection1 is 1 shard and the Collection2 and
Collection3 both have 2 shards. Both servers are identical.

So, here is my process, I do a lot of queries on Collection1 and
Collection2. I then do a bunch of inserts into Collection3. I am doing CSV
uploads. I am also doing custom shard routing. All the products in a single
upload will have the same shard key. All Solr interaction is through SolrJ
with full Zookeeper awareness. My uploads are also using soft commits.

I tried this on a record set of 936 products. Everything worked fine. I
then sent over a record set of 300k products. The upload into Collection3
is chunked. I tried both 1000 and 200,000 with similar results. The first
upload to Solr would just hang. There would simply be no response from
Solr. A few of the products from this request would make it into the index,
but not many.

In this state, queries continued to work, but deletes did not.

My only solution was to kill each Solr process.

As an experiment, I did the large catalog first. First, I reset everything.
With A chunk size of 1000, about 110,000 out of 300,000 records made it
into Solr before the process hung. Again, queries worked, but deletes did
not and I had to kill Solr. It hung after about 30 seconds. Timing-wise,
this is at about the second autocommit cycle, given the default autocommit
of 15 seconds. I am not sure if this is related or not.

As an additional experiment, I ran the entire test with just a single node
in the cluster. This time, everything ran fine.

Does anyone have any ideas? Everything is pretty default. These servers are
Azure VMs, although I have seen similar behavior running two Solr instances
on a single internal server as well.

I had also noticed similar behavior before with Solr 4.3. It definitely has
something do with the clustering, but I am not sure what. And I don't see
any error message (or really anything else) in the Solr logs.

Thanks.

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]


Re: adding custom fields to solr response

2013-08-13 Thread Jack Krupansky

The "fl" field controls what appears for documents in the response.

You can add function queries to the fl list, including aliases, such as:

fl=*,Four:sum(2,2)

You could do a custom writer if you really need to "mangle" the actually 
document output.


The bottom line is that fl will determine what gets output for a document, 
so you would have the "mangle" the "returnFields" list for the query  to add 
additional items.


-- Jack Krupansky

-Original Message- 
From: Rohit Harchandani

Sent: Tuesday, August 13, 2013 2:26 PM
To: solr-user@lucene.apache.org
Subject: adding custom fields to solr response

Hi,
I have created a custom component with some post filtering ability. Now I
am trying to add certain fields to the solr response. I was able to add it
as a separate response section, but i am having difficulty adding it to the
docs themselves. Is there an example of any component which adds fields to
the docs using DocTransformer ?
Thanks,
Rohit 



adding custom fields to solr response

2013-08-13 Thread Rohit Harchandani
Hi,
I have created a custom component with some post filtering ability. Now I
am trying to add certain fields to the solr response. I was able to add it
as a separate response section, but i am having difficulty adding it to the
docs themselves. Is there an example of any component which adds fields to
the docs using DocTransformer ?
Thanks,
Rohit


Re: Of tlogs and atomic updates

2013-08-13 Thread Yonik Seeley
On Tue, Aug 13, 2013 at 1:56 PM, Erick Erickson  wrote:
> Thanks. I suppose considering NRT there's no other choice.

Less to do with NRT, and more to do with everything else.  But yes,
there is currently no other choice (i.e. things wouldn't otherwise
work)

-Yonik
http://lucidworks.com

>
> On Tue, Aug 13, 2013 at 10:35 AM, Yonik Seeley  wrote:
>
>> On Tue, Aug 13, 2013 at 10:11 AM, Erick Erickson
>>  wrote:
>> > A question recently came up: Does the tlog store the entire document when
>> > an atomic update happens or just the incoming delta? My guess is that it
>> > stores the entire document, but that's a guess...
>>
>> Correct.
>>
>> -Yonik
>> http://lucidworks.com
>>


Re: Of tlogs and atomic updates

2013-08-13 Thread Erick Erickson
Thanks. I suppose considering NRT there's no other choice.


On Tue, Aug 13, 2013 at 10:35 AM, Yonik Seeley  wrote:

> On Tue, Aug 13, 2013 at 10:11 AM, Erick Erickson
>  wrote:
> > A question recently came up: Does the tlog store the entire document when
> > an atomic update happens or just the incoming delta? My guess is that it
> > stores the entire document, but that's a guess...
>
> Correct.
>
> -Yonik
> http://lucidworks.com
>


Re: autocomplete feature - where to begin

2013-08-13 Thread Cassandra Targett
The autocomplete feature in Solr is built on the spell checker
component, and is called Suggester, which is why you've seen both of
those mentioned. It's implemented with a searchComponent and a
requestHandler.

The Solr Reference Guide has a decent overview of how to implement it
and I just made a few edits to make what needs to be done a bit more
clear:

https://cwiki.apache.org/confluence/display/solr/Suggester

If you have suggestions for improvements to that doc (such as steps
that aren't clear), you're welcome to set up an account there and
leave a comment.

Cassandra

On Tue, Aug 13, 2013 at 11:16 AM, Mysurf Mail  wrote:
> I have indexed the data from the db and so far it searches really well.
> Now I want to create auto-complete/suggest feature in my website
> So far I have seen articles about Suggester, spellchecker, and
> searchComponents.
> Can someone point me to a good article about basic autocomplete
> implementation?


Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Greg Preston
I'm running into the same issue using composite routing keys when all of
the shard keys end up in one of the subshards.

-Greg


On Tue, Aug 13, 2013 at 9:34 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Scratch that. I obviously didn't pay attention to the stack trace.
> There is no workaround until 4.5 for this issue because we split the
> range by half and thus cannot guarantee that all segments will have
> numDocs > 0.
>
> On Tue, Aug 13, 2013 at 9:25 PM, Shalin Shekhar Mangar
>  wrote:
> > On Tue, Aug 13, 2013 at 9:15 PM, Robert Muir  wrote:
> >> On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar
> >>  wrote:
> >>> The splitting code calls commit before it starts the splitting. It
> creates
> >>> a LiveDocsReader using a bitset created by the split. This reader is
> merged
> >>> to an index using addIndexes.
> >>>
> >>> Shouldn't the addIndexes code then ignore all such 0-document segments?
> >>>
> >>>
> >>
> >> Not in 4.4: https://issues.apache.org/jira/browse/LUCENE-5116
> >
> >
> > Sorry, I didn't notice that. So 4.4 users must call commit/optimize
> > with expungeDeletes="true" until 4.5 is released if they run into this
> > problem.
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: SOLR memory usage (sort fields? replication?)

2013-08-13 Thread Andrea Gazzarini

More on this, I think I found something...

*Slave admin console- --> stats.jsp#cache**, FieldCache**
*
...
entries count: 22
entry#0 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/*_mp*.frq")'=>*'title_sort'*,class 


...
entry#9 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/*_mr*.frq")'=>*'title_sort'*,class

...
entry#14 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/slave-data-dir/cbt/main/data/index/*_mn*.frq")'=>*'title_sort'*,class



The data directory (on the slave) doesn't contain *_mn.** and *_mp.** 
but only *_mr.**!

> ls -la

drwxr-xr-x 2 agazzarini agazzarini   4096 Aug 13 18:17 .
drwxr-xr-x 3 agazzarini agazzarini   4096 Aug 13 16:32 ..
-rw-r--r-- 1 agazzarini agazzarini 8184675552 Aug 13 16:17 _mr.fdt
-rw-r--r-- 1 agazzarini agazzarini   14019212 Aug 13 16:17 _mr.fdx
-rw-r--r-- 1 agazzarini agazzarini   4957 Aug 13 16:08 _mr.fnm
-rw-r--r-- 1 agazzarini agazzarini  904512239 Aug 13 16:19 _mr.frq
-rw-r--r-- 1 agazzarini agazzarini  340972819 Aug 13 16:19 _mr.prx
-rw-r--r-- 1 agazzarini agazzarini   14154155 Aug 13 16:19 _mr.tii
-rw-r--r-- 1 agazzarini agazzarini  820714274 Aug 13 16:19 _mr.tis
-rw-r--r-- 1 agazzarini agazzarini3504631 Aug 13 16:20 _mr.tvd
-rw-r--r-- 1 agazzarini agazzarini  288506509 Aug 13 16:20 _mr.tvf
-rw-r--r-- 1 agazzarini agazzarini   28038420 Aug 13 16:20 _mr.tvx
-rw-r--r-- 1 agazzarini agazzarini 20 Aug 13 13:53 segments.gen
-rw-r--r-- 1 agazzarini agazzarini287 Aug 13 16:20 segments_i

*On the master node, I have mp and mr segments...**(why? is something 
like a commit point? if so, why the SLAVE admin console show me 
something about the mp segment? )*


-rw-r--r-- 1 agazzarini agazzarini 8184675552 Aug 13 14:24 _mp.fdt
-rw-r--r-- 1 agazzarini agazzarini   14019212 Aug 13 14:24 _mp.fdx
-rw-r--r-- 1 agazzarini agazzarini   4957 Aug 13 14:18 _mp.fnm
-rw-r--r-- 1 agazzarini agazzarini  904512316 Aug 13 14:26 _mp.frq
-rw-r--r-- 1 agazzarini agazzarini  340972819 Aug 13 14:26 _mp.prx
-rw-r--r-- 1 agazzarini agazzarini   14154155 Aug 13 14:26 _mp.tii
-rw-r--r-- 1 agazzarini agazzarini  820714274 Aug 13 14:26 _mp.tis
-rw-r--r-- 1 agazzarini agazzarini3504631 Aug 13 14:26 _mp.tvd
-rw-r--r-- 1 agazzarini agazzarini  288506509 Aug 13 14:26 _mp.tvf
-rw-r--r-- 1 agazzarini agazzarini   28038420 Aug 13 14:26 _mp.tvx
-rw-r--r-- 1 agazzarini agazzarini 8184675552 Aug 13 16:17 _mr.fdt
-rw-r--r-- 1 agazzarini agazzarini   14019212 Aug 13 16:17 _mr.fdx
-rw-r--r-- 1 agazzarini agazzarini   4957 Aug 13 16:08 _mr.fnm
-rw-r--r-- 1 agazzarini agazzarini  904512239 Aug 13 16:19 _mr.frq
-rw-r--r-- 1 agazzarini agazzarini  340972819 Aug 13 16:19 _mr.prx
-rw-r--r-- 1 agazzarini agazzarini   14154155 Aug 13 16:19 _mr.tii
-rw-r--r-- 1 agazzarini agazzarini  820714274 Aug 13 16:19 _mr.tis
-rw-r--r-- 1 agazzarini agazzarini3504631 Aug 13 16:20 _mr.tvd
-rw-r--r-- 1 agazzarini agazzarini  288506509 Aug 13 16:20 _mr.tvf
-rw-r--r-- 1 agazzarini agazzarini   28038420 Aug 13 16:20 _mr.tvx
-rw-r--r-- 1 agazzarini agazzarini287 Aug 13 14:26 segments_g
-rw-r--r-- 1 agazzarini agazzarini 20 Aug 13 16:20 segments.gen
-rw-r--r-- 1 agazzarini agazzarini287 Aug 13 16:20 segments_i

if I execute a query sorting by title_sort, on the admin page (#cache) I 
see the field cache populated:


entry count 1
entry#0 : 
'MMapIndexInput(path="/home/agazzarini/solr-indexes/master-data-dir/cbt/main/data/index/_mr.frq")'=>'title_sort',class 



So,

1. mr.* is the only segment I have on the slave...and I would expect to
   find only that in the slave
2. mp.* is in the data dir of the master and I see something that
   refers to it in the slave admin console...coudl be the reason of
   title_sort doubled in memory
3. mn.* boh? what is this? FieldCacheImpl has another third reference
   of title sort values for this mn


On 08/13/2013 05:51 PM, Andrea Gazzarini wrote:

Hi,
I'm getting some Out of memory (heap space) from my solr instance and 
after investigating a little bit, I found several threads about 
sorting behaviour in SOLR.


First, some information about the environment

- I'm using SOLR 3.6.1 and master / slave architecture with 1 master 
and 2 slaves.
- All of them have Xms and Xmx set to 4GB, index is about 10GB for 
about 1.800.000 documents.

- Indexes are updated (and therefore replicated once in a day)

After the first OOM I saw the corresponding dump on Memory Analyzer 
and I found a BIG /org.apache.lucene.search.FieldCacheImpl /instance 
(more than 2GB)...I exploded its internal structure and realized that 
I had a lot of long long sort fields (book titles which were composed 
by title + subtitle + author concatenated)...so, what I did? basically 
I reduced the length of that field (now is composed only by the first 
title) so now I have a more limited number of unique fields.


Now, 5 hours ago

- I took the production SOLR log and I extra

Re: Problem escaping ampersands in HTTP GET

2013-08-13 Thread John Randall
Already fixed thanks to Shawn Heisey. Had to URL encode the ampersand and 
semicolon in &




From: Erik Hatcher 
To: "solr-user@lucene.apache.org"  
Sent: Monday, August 12, 2013 9:06 PM
Subject: Re: Problem escaping ampersands in HTTP GET


How are you indexing documents?

You need to either URL encode things or XML encode, sounds like. 

  Erik

On Aug 12, 2013, at 14:49, John Randall  wrote:

> I am using an HTTP GET to add docs to Solr. All the docs load fine as long as 
> none contain an ampersand. I get a syntax error when a doc contains a field, 
> for example, with the phrase "Jack & Jill". 
>  
> How can I escape the ampersand so that the doc loads normally?
>  
> Thanks in advance.

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Shalin Shekhar Mangar
Scratch that. I obviously didn't pay attention to the stack trace.
There is no workaround until 4.5 for this issue because we split the
range by half and thus cannot guarantee that all segments will have
numDocs > 0.

On Tue, Aug 13, 2013 at 9:25 PM, Shalin Shekhar Mangar
 wrote:
> On Tue, Aug 13, 2013 at 9:15 PM, Robert Muir  wrote:
>> On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar
>>  wrote:
>>> The splitting code calls commit before it starts the splitting. It creates
>>> a LiveDocsReader using a bitset created by the split. This reader is merged
>>> to an index using addIndexes.
>>>
>>> Shouldn't the addIndexes code then ignore all such 0-document segments?
>>>
>>>
>>
>> Not in 4.4: https://issues.apache.org/jira/browse/LUCENE-5116
>
>
> Sorry, I didn't notice that. So 4.4 users must call commit/optimize
> with expungeDeletes="true" until 4.5 is released if they run into this
> problem.
>
> --
> Regards,
> Shalin Shekhar Mangar.



-- 
Regards,
Shalin Shekhar Mangar.


Re: Do docValues influence range faceting speed in solr?

2013-08-13 Thread Chris Hostetter

: I don't think so. I looked at sources - range and query facets are backed
: on SolrIndexSearcher.numDocs(Query, DocSet).

on fields that use docValues, range queries (regardless 
of wether they are come from q, fq, facet.query, facet.range) are 
sometimes implemented using the docValues via that FieldType's 
getRangeQuery().  

The specifics of when the docValues are used depend on other field 
type properties (ie: multiValued? , indexed? , numeric?) but the bottom 
line is if you do range faceting on a field, the same query will be used 
for filtering on thatfiled, so you should get a filterCache hit.


-Hoss


Re: Facet field display name

2013-08-13 Thread Jason Hellman
It's been my experience that using they convenient feature to change the output 
key still doesn't save you from having to map it back to the field name 
underlying it in order to trigger the filter query.  With that in mind it just 
makes more sense to me to leave the effort in the View portion of the design.  

On Aug 12, 2013, at 6:34 AM, Peter Sturge  wrote:

> 2c worth,
> We do lots of facet lookups to allow 'prettyprint' versions of facet names.
> We do this on the client-side, though. The reason is then the lookups can
> be different for different locations/users etc. - makes it easy for
> localization.
> It's also very easy to implement such a lookup, without having to disturb
> the innards of Solr...
> 
> 
> 
> On Mon, Aug 12, 2013 at 2:25 PM, Erick Erickson 
> wrote:
> 
>> Have you seen the "key" parameter here:
>> 
>> http://wiki.apache.org/solr/SimpleFacetParameters#key_:_Changing_the_output_key
>> 
>> it allows you to label the output key anything you want, and since these
>> are
>> field names, this seems to-able.
>> 
>> Best,
>> Erick
>> 
>> 
>> On Mon, Aug 12, 2013 at 4:02 AM, Aleksander Akerø >> wrote:
>> 
>>> Hi
>>> 
>>> I wondered if there was some way to configure a display name for facet
>>> fields. Either that or some way to display nordic letters without it
>>> messing up the faceting.
>>> 
>>> Say I wanted a facet field called "område" (norwegian, "area" in
>> english).
>>> Then I would have to create the field something like this in schema.xml:
>>> 
>>> >> required="false" />
>>> 
>>> But then I would have to do a replace to show a "prettier" name in
>>> frontend. It would be preferred not to do this sort of hardcoding, as I
>>> would have to do this for all the facet fields.
>>> 
>>> 
>>> Either that or I could try encoding the 'å' like this:
>>> 
>>> >> required="false" />
>>> 
>>> Then it will show up with a pretty name, but the faceting will fail.
>> Maybe
>>> this is due to encoding issues, seen as the frontend is encoded with
>>> ISO-8859-1?
>>> 
>>> 
>>> So does anyone have a good practice for either getting this sort of
>> problem
>>> working properly. Or a way to define an alternative "display name" for a
>>> facet field, that I could display instead of the field.name?
>>> 
>>> 
>>> *Aleksander Akerø*
>>> Systemkonsulent
>>> Mobil: 944 89 054
>>> E-post: aleksan...@gurusoft.no
>>> 
>>> *Gurusoft AS*
>>> Telefon: 92 44 09 99
>>> Østre Kullerød
>>> www.gurusoft.no
>>> 
>> 



autocomplete feature - where to begin

2013-08-13 Thread Mysurf Mail
I have indexed the data from the db and so far it searches really well.
Now I want to create auto-complete/suggest feature in my website
So far I have seen articles about Suggester, spellchecker, and
searchComponents.
Can someone point me to a good article about basic autocomplete
implementation?


Re: Ping request uses wrong default search field?

2013-08-13 Thread Chris Hostetter

: Not sure if this is a bug or intended behaviour, but the ping query seems to
: rely on the value of the default "df" value in the requestHandler, rather than
: on the core's defaultSearchField defined in schema.xml.

The df *request* param will always override the defaultSearchField in the 
schema.

"defaults" you specify on handlers in solrconfig.xml are default *request* 
params you want that handler to use unless the param with the same name is 
spcified as part of hte request.

So if you have a "defaults" df in your solrconfig.xml it's going to 
override the defaultSearchField in schema.xml


-Hoss


solr not writing logs when it runs not from its main folder

2013-08-13 Thread Mysurf Mail
When I run solr using

java -jar "C:\solr\example\start.jar"

It writes logs to C:\solr\example\logs.

When I run it using

java -Dsolr.solr.home="C:\solr\example\solr"
 -Djetty.home="C:\solr\example"
 -Djetty.logs="C:\solr\example\logs"
 -jar "C:\solr\example\

start.jar"

it writes logs only if I run it from

C:\solr\example>

any other folder - logs are not written.
This is important as I need to run it as a service later (using nssm) What
should I change?


Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Shalin Shekhar Mangar
On Tue, Aug 13, 2013 at 9:15 PM, Robert Muir  wrote:
> On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar
>  wrote:
>> The splitting code calls commit before it starts the splitting. It creates
>> a LiveDocsReader using a bitset created by the split. This reader is merged
>> to an index using addIndexes.
>>
>> Shouldn't the addIndexes code then ignore all such 0-document segments?
>>
>>
>
> Not in 4.4: https://issues.apache.org/jira/browse/LUCENE-5116


Sorry, I didn't notice that. So 4.4 users must call commit/optimize
with expungeDeletes="true" until 4.5 is released if they run into this
problem.

-- 
Regards,
Shalin Shekhar Mangar.


SOLR memory usage (sort fields? replication?)

2013-08-13 Thread Andrea Gazzarini

Hi,
I'm getting some Out of memory (heap space) from my solr instance and 
after investigating a little bit, I found several threads about sorting 
behaviour in SOLR.


First, some information about the environment

- I'm using SOLR 3.6.1 and master / slave architecture with 1 master and 
2 slaves.
- All of them have Xms and Xmx set to 4GB, index is about 10GB for about 
1.800.000 documents.

- Indexes are updated (and therefore replicated once in a day)

After the first OOM I saw the corresponding dump on Memory Analyzer and 
I found a BIG /org.apache.lucene.search.FieldCacheImpl /instance (more 
than 2GB)...I exploded its internal structure and realized that I had a 
lot of long long sort fields (book titles which were composed by title + 
subtitle + author concatenated)...so, what I did? basically I reduced 
the length of that field (now is composed only by the first title) so 
now I have a more limited number of unique fields.


Now, 5 hours ago

- I took the production SOLR log and I extracted something about 20.000 
(real) queries
- I started the master, slaves and reindexed all documents, after a 
little index has been replicated on slaves.
- I started solrmeter that is randonmly querying slaves (using the 
extracted queries)
- After two hours memory comsuption peak was (jvisualvm) about 2GB, 
every (moreless) 5 minutes GC freed about 500GB...constantly.
- I indexed 4000 documents, 10 minutes after replication the whole 
memory consumption has been completely translated up...min peak  2GB 
min, 2.6GB max.
- After two hours I indexed other documents (4000) and now I have a min 
peak of 2.6GB and a max of 3.4GB...and is still slowly growing...


Note that the number of newly indexed documents is not so relevant (4000 
on a total of 1.800.000)


Now, using JConsole I see

- a PS Eden space which is periodically clean (it's responsible of the 
wave between the min and the max usage)

- a PS Survivor space which is very low (16MB)
- a PS Old Gen which is set to 2.6GB and it's growing, very slowly but 
it's still growing...


Now, the question...

I generated another dump and, as expected, the most part of the usage is 
still in /org.apache.lucene.search.FieldCacheImpl. /Of course, the size 
is about 980MB (initially it was more than 2GB) which seems good (at 
least better than the initial situation). The most part of those 980MB 
are still occupied by sort fields


What I'm not understanding is how sort fields are loaded in memory...
I mean, I read that in order to optimize sorting, SOLR needs to load all 
values of sort fields, ok, that's good. But why I see several 
WeakHashMaps that contains different Entry references with the same sort 
field (and its values)?


For example for title_sort (unique values are 1.432.000) I have two 
(different, is not the same reference) Entry objects with a


- key "title_sort"
- and a value (org.apache.lucene.search.FieldCache$StringIndex) which 
has a int array [1.432.000] and a String array with moreless the same size


So the memory usage (in this case) is doubled...are sort field values 
loaded in memory more than once? How many times?


Best and as usual, sorry for the long email
Andrea


Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
On Tue, Aug 13, 2013 at 11:39 AM, Shalin Shekhar Mangar
 wrote:
> The splitting code calls commit before it starts the splitting. It creates
> a LiveDocsReader using a bitset created by the split. This reader is merged
> to an index using addIndexes.
>
> Shouldn't the addIndexes code then ignore all such 0-document segments?
>
>

Not in 4.4: https://issues.apache.org/jira/browse/LUCENE-5116


Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Shalin Shekhar Mangar
The splitting code calls commit before it starts the splitting. It creates
a LiveDocsReader using a bitset created by the split. This reader is merged
to an index using addIndexes.

Shouldn't the addIndexes code then ignore all such 0-document segments?


On Tue, Aug 13, 2013 at 6:08 PM, Robert Muir  wrote:

> Well, i meant before, but i just took a look and this is implemented
> differently than the "merge" one.
>
> In any case, i think its the same bug, because I think the only way
> this can happen is if somehow this splitter is trying to create a
> 0-document "split" (or maybe a split containing all deletions).
>
> On Tue, Aug 13, 2013 at 8:22 AM, Srivatsan 
> wrote:
> > Ya i am performing commit after split request is submitted to server.
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Split-Shard-Error-maxValue-must-be-non-negative-tp4084220p4084256.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 4.4 Cloud always indexing to only one shard

2013-08-13 Thread Daniel Collins
I think I see the confusion.  Erick is right that using collections API
would sort the problem, but here is my rationale on why the confusion
exists.

There are 3 stages to creating a valid collection (well this is how I think
of it)

1) Upload a solrconfig.xml/schema.xml (+ A N Other required config) to ZK,
2) Create the "logical" collection, this step is dividing up the hash range
of unique Ids for the shards, and needs to know NumShards for the
collection in order to divide up the complete hash range into the relevant
sections.
3) Create cores and assign them to each shard, etc.

Prasi, you are doing step 1 and then step 3, but because you never use
Collections API, you are never doing step 2.  Hence Solr has no idea how
many shards you have in your collection, so it assumes 1 when you create
your first core.  Hence everything else after this is just a replica of
that core.

ZkCli can do step 1, and the core API can be used for step 3 (we used it
only because collection API didn't used to support things like dataDir,
with discovery mode this becomes less important, as we can deploy
core.properties files at will) *but* only if you've already defined the
number of shards for your collection either on command line or through use
of collectionAPI.

But (and this is the important bit), the way to do step 2 properly is via
Collections API.  It is also possible to start a solr instance and supply
numShards to it via an environment variable (-DnumShards=X) and that seems
to work, but I don't think that is the recommended approach.



On 13 August 2013 13:05, Erick Erickson  wrote:

> Again, why are you using ...admin/cores rather than admin/collections?
>
> See:
>
> http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API
>
> Best
> Erick
>
>
> On Tue, Aug 13, 2013 at 5:00 AM, Prasi S  wrote:
>
> > I create a collection prior to tomcat startup.
> >
> > -->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
> > -zkhost localhost:2181 -confdir solr-conf -confname solrconf1
> >
> > -->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd
> linkconfig
> > -zkhost 127.0.0.1:2181 -collection firstcollection -confname solrconf1
> > -solrhome ../solr_instances/solr
> >
> > 1. Start Zookeeper server
> > 2. Link the configuaration to the collection
> > 3. Check those in ZooClient
> > 4. Start tomcats
> > 5. Create cores and assign to collections.
> >
> >
> >
> http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1
> >
> > Are these ok or am I making a mistake?
> >
> >
> > On Mon, Aug 12, 2013 at 6:49 PM, Erick Erickson  > >wrote:
> >
> > > Why are you using the core creation commands rather than the
> > > collection commands? The latter are intended for SolrCloud...
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Mon, Aug 12, 2013 at 4:51 AM, Prasi S  wrote:
> > >
> > > > Hi,
> > > > I have setup solrcloud in solr 4.4, with 2 solr's in 2 tomcat servers
> > and
> > > > Zookeeper.
> > > >
> > > > I setup Zookeeper with a collection "firstcollection" and then i give
> > the
> > > > belwo command
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1
> > > >
> > > > Similarly, i create 4 shards. 2 shard in the first instance and two
> > > shards
> > > > in the second instance.
> > > >
> > > > When i index files to
> > > >
> > >
> >
> http://localhost:8080/solr/firstcollection/dataimport?command=full-import,
> > > > the data always gets indexed to the shard1.
> > > >
> > > > There are no documents in shard2, 3 ,4. I checked this with
> > > >
> > > > http://localhost:8080/solr/firstcollection/select?q=*:*&fl=[shard]
> > > >
> > > > But searching across any of the two gives full results. It this a
> > problem
> > > > with 4.4 version.
> > > >
> > > > Similar scenario , i have tested in solr 4.0 and itr was working
> fine.
> > > >
> > > > Pls help.
> > > >
> > >
> >
>


SOLR4 Spatial sorting and query string

2013-08-13 Thread roySolr
Hello,

I use the following distance sorting of SOLR
4(solr.SpatialRecursivePrefixTreeFieldType):

fl=*,score&sort=score asc&q={!geofilt score=distance filter=false
sfield=coords pt=54.729696,-98.525391 d=10}  

(from the tutorial on
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)

Now i want to query on a searchstring and still want to sort on distance.
How can i combine this in above solr request? When i add something to the
"q=" it doesn't work. I tried _query_ subquery and other stuff but i don't
get it working

I appreciate any help,
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-Spatial-sorting-and-query-string-tp4084318.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Of tlogs and atomic updates

2013-08-13 Thread Yonik Seeley
On Tue, Aug 13, 2013 at 11:01 AM, Anirudha Jadhav  wrote:
> quick question on a similar topic,
>
> for a NRT call to index a doc ,returns a success return code, if and only
> if all available server have successfully written the doc to their tlog.
> correct?

Right.

-Yonik
http://lucidworks.com


Re: Of tlogs and atomic updates

2013-08-13 Thread Anirudha Jadhav
quick question on a similar topic,

for a NRT call to index a doc ,returns a success return code, if and only
if all available server have successfully written the doc to their tlog.
correct?


On Tue, Aug 13, 2013 at 10:35 AM, Yonik Seeley  wrote:

> On Tue, Aug 13, 2013 at 10:11 AM, Erick Erickson
>  wrote:
> > A question recently came up: Does the tlog store the entire document when
> > an atomic update happens or just the incoming delta? My guess is that it
> > stores the entire document, but that's a guess...
>
> Correct.
>
> -Yonik
> http://lucidworks.com
>



-- 
Anirudha P. Jadhav


Re: Obtain shard routing key during document insert

2013-08-13 Thread Terry P.
Just a bump to see if anyone knows if this can be done.

We want to get the shard routing key during insert as we have a plugin
operating within the UpdateRequestProcessor that is inserting the original
document being indexed into a resilient backing store so Solr only has to
index it and not store the original.

We appreciate any and all info and ideas anyone may have.



On Wed, Aug 7, 2013 at 5:17 PM, Terry P.  wrote:

> Is it possible to obtain the shard routing key from within an
> UpdateRequestProcessor when a document is being inserted?
>
> Many thanks,
> Terry
>


Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread Anirudha Jadhav
At this point you would need a higher level service sitting on top on solr
clusters which also talks to  your zk setup in order to create custom
collections on the fly.

its not super difficult, but seems out of scope for solrcloud  now.

let me know if others have a different opinion.

thanks,
Ani


On Tue, Aug 13, 2013 at 9:52 AM, Shawn Heisey  wrote:

> On 8/13/2013 3:07 AM, xinwu wrote:
> > When I managed collections via the Collections API.
> > How can I set the 'instanceDir' name?
> > eg:
> http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
> > My instanceDir is 'mycollection_shard2_replica1'.
> > How can I change it to 'mycollection'?
>
> I don't think the collections API can do this, and to be honest, I don't
> know why you would want to.  It would make it impossible to have more
> than one shard per Solr node, a capability that many people require.
> The question of "why would you want to?" is something I'm genuinely
> asking here.
>
> Admin URLs accessed directly by client programs are the only logical
> reason I can think of.  For querying and updating the index, you can use
> /solr/mycollection as a base URL to access your index, even though the
> shard names are different.  As for the admin URLs that let you access
> system information, SOLR-4943 will make most of that available without a
> core name in Solr 4.5.  To access core-specific information, you need to
> use the actual core name, but it should be possible to gather
> information about which machine has which core in an automated way.
>
> That said, if you create your collection a different way, you should be
> able to do exactly what you want.  What you would want to do is use the
> zkcli command "linkconfig" to link a new collection with an already
> uploaded config set, and then create the individual cores in your
> collection using the CoreAdmin API instead of the Collections API.
>
> http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
> http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin
>
> Thanks,
> Shawn
>
>


-- 
Anirudha P. Jadhav


Re: Problem running Solr indexing in Amazon EMR

2013-08-13 Thread Michael Della Bitta
If you do end up figuring it out, would you mind letting me know? Right
now, our solution is to use an older version of SolrJ, but that means we
miss out on some of the improvements/bugfixes around aliases.

Thanks,

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Mon, Aug 12, 2013 at 7:21 PM, Dmitriy Shvadskiy wrote:

> Michael,
> We replaced Lucene jars but run into a problem with incompatible version of
> Apache HttpComponents. Still figuring it out.
>
> Dmitriy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-running-Solr-indexing-in-Amazon-EMR-tp4083636p4084121.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Of tlogs and atomic updates

2013-08-13 Thread Yonik Seeley
On Tue, Aug 13, 2013 at 10:11 AM, Erick Erickson
 wrote:
> A question recently came up: Does the tlog store the entire document when
> an atomic update happens or just the incoming delta? My guess is that it
> stores the entire document, but that's a guess...

Correct.

-Yonik
http://lucidworks.com


Re: Tokenization at query time

2013-08-13 Thread Andrea Gazzarini

Trying...thank you very much!

I'll let you know

Best,
Andrea

On 08/13/2013 04:18 PM, Erick Erickson wrote:

I think you can get what you want by escaping the space with a backslash

YMMV of course.
Erick


On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:


Hi Erick,
sorry if that wasn't clear: this is what I'm actually observing in my
application.

I wrote the first post after looking at the explain (debugQuery=true): the
query

q=mag 778 G 69

is translated as follow:


/  +((DisjunctionMaxQuery((//**myfield://*mag*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*778*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*g*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*69*//^3000.0)~0.1))**~4)
   DisjunctionMaxQuery((//**myfield://*mag778g69*//^3.**0)~0.1)/

It seems that althouhg I declare myfield with this type

/

 
 

 
 
 


/SOLR is tokenizing it therefore by producing several tokens
(mag,778,g,69)/
/

And I can't put double quotes on the query (q="mag 778 G 69") because the
request handler searches also in other fields (with different configuration
chains)

As I understood the query parser, (i.e. query time), does a whitespace
tokenization on its own before invoking my (query-time) chain. The same
doesn't happen at index time...this is my problem...because at index time
the field is analyzed exactly as I want...but unfortunately cannot say the
same at query time.

Sorry for my wonderful english, did you get the point?


On 08/13/2013 02:18 PM, Erick Erickson wrote:


On a quick scan I don't see a problem here. Attach
&debug=query to your url and that'll show you the
parsed query, which will in turn show you what's been
pushed through the analysis chain you've defined.

You haven't stated whether you've tried this and it's
not working or you're looking for guidance as to how
to accomplish this so it's a little unclear how to
respond.

BTW, the admin/analysis page is your friend here

Best
Erick


On Mon, Aug 12, 2013 at 12:52 PM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

  Clear, thanks for response.

So, if I have two fields


  
  

  
  
  


  
  
  
  
  

  


(first field type *Mag. 78 D 99* becomes *mag78d99* while second field
type ends with several tokens)

And I want to use the same request handler to query against both of them.
I mean I want the user search something like

http///search?q=Mag 78 D 99

and this search should search within both the first (with type1) and
second (with type 2) by matching

- a document which has field_with_type1 equals to *mag78d99* or
- a document which has field_with_type2 that contains a text like "go to
*mag 78*, class *d* and subclass *99*)



  ...
  dismax
  ...
  100%
  
  field_with_type1
  field_with_type_2
  
  ...


is not possible? If so, is possible to do that in some other way?

Sorry for the long email and thanks again
Andrea


On 08/12/2013 04:01 PM, Jack Krupansky wrote:

  Quoted phrases will be passed to the analyzer as one string, so there a

white space tokenizer is needed.

-- Jack Krupansky

-Original Message- From: Andrea Gazzarini
Sent: Monday, August 12, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Tokenization at query time

Hi Tanguy,
thanks for fast response. What you are saying corresponds perfectly with
the behaviour I'm observing.
Now, other than having a big problem (I have several other fields both
in the pf and qf where spaces doesn't matter, field types like the
"text_en" field type in the example schema) what I'm wondering is:

/"The query parser splits the input query on white spaces, and the each
token is analysed according to your configuration"//
/
Is there a valid reason to declare a WhiteSpaceTokenizer in a query
analyzer? If the input query is already parsed (i.e. whitespace
tokenized) what is its effect?

Thank you very much for the help
Andrea

On 08/12/2013 12:37 PM, Tanguy Moal wrote:

  Hello Andrea,

I think you face a rather common issue involving keyword tokenization
and query parsing in Lucene:
The query parser splits the input query on white spaces, and then each
token is analysed according to your configuration.
So those queries with a whitespace won't behave as expected because
each
token is analysed separately. Consequently, the catenated version of
the
reference cannot be generated.
I think you could try surrounding your query with double quotes or
escaping the space characters in your query using a backslash so that
the
whole sequence is analysed in the same analyser and the catenation
occurs.
You should be aware that this approach has a drawback: you will
probably
not be able to combine the search for Mag. 778 G 69 with other words in
other fields unless you are able to identify which spaces are to be
escaped:
For example, if input t

Re: Tokenization at query time

2013-08-13 Thread Erick Erickson
I think you can get what you want by escaping the space with a backslash

YMMV of course.
Erick


On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

> Hi Erick,
> sorry if that wasn't clear: this is what I'm actually observing in my
> application.
>
> I wrote the first post after looking at the explain (debugQuery=true): the
> query
>
> q=mag 778 G 69
>
> is translated as follow:
>
>
> /  +((DisjunctionMaxQuery((//**myfield://*mag*//^3000.0)~0.1)
>   DisjunctionMaxQuery((//**myfield://*778*//^3000.0)~0.1)
>   DisjunctionMaxQuery((//**myfield://*g*//^3000.0)~0.1)
>   DisjunctionMaxQuery((//**myfield://*69*//^3000.0)~0.1))**~4)
>   DisjunctionMaxQuery((//**myfield://*mag778g69*//^3.**0)~0.1)/
>
> It seems that althouhg I declare myfield with this type
>
> /
>
> 
> 
>
> 
>  generateWordParts="0" generateNumberParts="0"
> catenateWords="0" catenateNumbers="0" 
> catenateAll="1"**splitOnCaseChange="0"
> />
> 
> 
>
> /SOLR is tokenizing it therefore by producing several tokens
> (mag,778,g,69)/
> /
>
> And I can't put double quotes on the query (q="mag 778 G 69") because the
> request handler searches also in other fields (with different configuration
> chains)
>
> As I understood the query parser, (i.e. query time), does a whitespace
> tokenization on its own before invoking my (query-time) chain. The same
> doesn't happen at index time...this is my problem...because at index time
> the field is analyzed exactly as I want...but unfortunately cannot say the
> same at query time.
>
> Sorry for my wonderful english, did you get the point?
>
>
> On 08/13/2013 02:18 PM, Erick Erickson wrote:
>
>> On a quick scan I don't see a problem here. Attach
>> &debug=query to your url and that'll show you the
>> parsed query, which will in turn show you what's been
>> pushed through the analysis chain you've defined.
>>
>> You haven't stated whether you've tried this and it's
>> not working or you're looking for guidance as to how
>> to accomplish this so it's a little unclear how to
>> respond.
>>
>> BTW, the admin/analysis page is your friend here
>>
>> Best
>> Erick
>>
>>
>> On Mon, Aug 12, 2013 at 12:52 PM, Andrea Gazzarini <
>> andrea.gazzar...@gmail.com> wrote:
>>
>>  Clear, thanks for response.
>>>
>>> So, if I have two fields
>>>
>>> 
>>>  
>>>  
>>>
>>>  
>>>  >>
>>> generateWordParts="0" generateNumberParts="0"
>>>  catenateWords="0" catenateNumbers="0" catenateAll="1"
>>> splitOnCaseChange="0" />
>>>  
>>> 
>>> 
>>>  
>>>  >> mapping="mapping-FoldToASCII.txt"/>
>>>  
>>>  
>>>  
>>>
>>>  
>>> 
>>>
>>> (first field type *Mag. 78 D 99* becomes *mag78d99* while second field
>>> type ends with several tokens)
>>>
>>> And I want to use the same request handler to query against both of them.
>>> I mean I want the user search something like
>>>
>>> http///search?q=Mag 78 D 99
>>>
>>> and this search should search within both the first (with type1) and
>>> second (with type 2) by matching
>>>
>>> - a document which has field_with_type1 equals to *mag78d99* or
>>> - a document which has field_with_type2 that contains a text like "go to
>>> *mag 78*, class *d* and subclass *99*)
>>>
>>>
>>> 
>>>  ...
>>>  dismax
>>>  ...
>>>  100%
>>>  
>>>  field_with_type1
>>>  field_with_type_2
>>>  
>>>  ...
>>> 
>>>
>>> is not possible? If so, is possible to do that in some other way?
>>>
>>> Sorry for the long email and thanks again
>>> Andrea
>>>
>>>
>>> On 08/12/2013 04:01 PM, Jack Krupansky wrote:
>>>
>>>  Quoted phrases will be passed to the analyzer as one string, so there a
 white space tokenizer is needed.

 -- Jack Krupansky

 -Original Message- From: Andrea Gazzarini
 Sent: Monday, August 12, 2013 6:52 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Tokenization at query time

 Hi Tanguy,
 thanks for fast response. What you are saying corresponds perfectly with
 the behaviour I'm observing.
 Now, other than having a big problem (I have several other fields both
 in the pf and qf where spaces doesn't matter, field types like the
 "text_en" field type in the example schema) what I'm wondering is:

 /"The query parser splits the input query on white spaces, and the each
 token is analysed according to your configuration"//
 /
 Is there a valid reason to declare a WhiteSpaceTokenizer in a query
 analyzer? If the input query is already parsed (i.e. whitespace
 tokenized) what is its effect?

 Thank you very much for the help
 Andrea

 On 08/12/2013 12:37 PM, Tanguy Moal wrote:

  Hello Andrea,
> I think you face a rather common issue involving keyword tokenization
> and query parsing in Lucene:
> The query parser splits the input query on white spaces, and 

Of tlogs and atomic updates

2013-08-13 Thread Erick Erickson
A question recently came up: Does the tlog store the entire document when
an atomic update happens or just the incoming delta? My guess is that it
stores the entire document, but that's a guess...

Thanks,
Erick


Re: Please add me to the ContributorsGroup. Username is Epping

2013-08-13 Thread Erick Erickson
Done, thanks for helping!


On Tue, Aug 13, 2013 at 9:17 AM, Ann Tran  wrote:

> Dear admin,
>
> Please add me to the ContributorsGroup so that I can add my websites which
> are using Solr and Lucene Java.
>
> Thank you & best regards,
> Ann
>


Please add me to the ContributorsGroup. Username is Epping

2013-08-13 Thread Ann Tran
Dear admin,

Please add me to the ContributorsGroup so that I can add my websites which
are using Solr and Lucene Java.

Thank you & best regards,
Ann


issue with custom tokenizer

2013-08-13 Thread dhaivat dave
Hello All,

I am trying to develop custom tokeniser (please find code below) and found
some issue while adding multiple document one after another.

it works fine when i add first document and when i add another document
it's not calling "create" method from SampleTokeniserFactory.java but it
calls directly reset method and then call incrementToken(). any one have an
idea on this what's wrong in the code below?  please share your thoughts on
this.

here is the class which extends TokeniserFactory class

=== SampleTokeniserFactory.java

public class SampleTokeniserFactory extends TokenizerFactory {

public SampleTokeniserFactory(Map args) {
super(args);
}

public SampleTokeniser create(AttributeFactory factory, Reader reader) {
return new SampleTokeniser(factory, reader);
}

}

here is the class which extends Tokenizer class


package ns.solr.analyser;

import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;

public class SampleTokeniser extends Tokenizer {

private List tokenList = new ArrayList();

int tokenCounter = -1;

private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);

/**
 * Object that defines the offset attribute
 */
private final OffsetAttribute offsetAttribute = (OffsetAttribute)
addAttribute(OffsetAttribute.class);

/**
 * Object that defines the position attribute
 */
private final PositionIncrementAttribute position =
(PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class);

public SampleTokeniser(AttributeFactory factory, Reader reader) {
super(factory, reader);
String textToProcess = null;
try {
textToProcess = readFully(reader);
processText(textToProcess);
} catch (IOException e) {
e.printStackTrace();
}

}

public String readFully(Reader reader) throws IOException {
char[] arr = new char[8 * 1024]; // 8K at a time
StringBuffer buf = new StringBuffer();
int numChars;
while ((numChars = reader.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
return buf.toString();
}

public void processText(String textToProcess) {

String wordsList[] = textToProcess.split(" ");

int startOffset = 0, endOffset = 0;

for (String word : wordsList) {

endOffset = word.length();

Token aToken = new Token("Token." + word, startOffset, endOffset);

aToken.setPositionIncrement(1);

tokenList.add(aToken);

startOffset = endOffset + 1;
}
}

@Override
public boolean incrementToken() throws IOException {

clearAttributes();
tokenCounter++;

if (tokenCounter < tokenList.size()) {
Token aToken = tokenList.get(tokenCounter);

termAtt.append(aToken);
termAtt.setLength(aToken.length());
offsetAttribute.setOffset(correctOffset(aToken.startOffset()),
correctOffset(aToken.endOffset()));
position.setPositionIncrement(aToken.getPositionIncrement());
return true;
}

return false;
}

/**
 * close object
 *
 * @throws IOException
 */
public void close() throws IOException {
super.close();
System.out.println("Close method called");

}

/**
 * called when end method gets called
 *
 * @throws IOException
 */
public void end() throws IOException {
super.end();
// setting final offset
System.out.println("end called with final offset");
}

/**
 * method reset the record
 *
 * @throws IOException
 */
public void reset() throws IOException {
super.reset();
System.out.println("Reset Called");
tokenCounter = -1;

}
}


Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread Shawn Heisey
On 8/13/2013 3:07 AM, xinwu wrote:
> When I managed collections via the Collections API.
> How can I set the 'instanceDir' name?
> eg:http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
>  
> My instanceDir is 'mycollection_shard2_replica1'.
> How can I change it to 'mycollection'?

I don't think the collections API can do this, and to be honest, I don't
know why you would want to.  It would make it impossible to have more
than one shard per Solr node, a capability that many people require.
The question of "why would you want to?" is something I'm genuinely
asking here.

Admin URLs accessed directly by client programs are the only logical
reason I can think of.  For querying and updating the index, you can use
/solr/mycollection as a base URL to access your index, even though the
shard names are different.  As for the admin URLs that let you access
system information, SOLR-4943 will make most of that available without a
core name in Solr 4.5.  To access core-specific information, you need to
use the actual core name, but it should be possible to gather
information about which machine has which core in an automated way.

That said, if you create your collection a different way, you should be
able to do exactly what you want.  What you would want to do is use the
zkcli command "linkconfig" to link a new collection with an already
uploaded config set, and then create the individual cores in your
collection using the CoreAdmin API instead of the Collections API.

http://wiki.apache.org/solr/SolrCloud#Command_Line_Util
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin

Thanks,
Shawn



SOLR Setup in Websphere

2013-08-13 Thread Thirukumaran - Mariappan
Hi,

I recently tried setting up Solr  in Tomcat. It works well without issues.

I tried setting up SOLR 3.6.2 in Websphere 7.0.0.25 by deploying the solr
war available in the dist folder. But after starting the solr instance in
WAS, unable to view the Solr home page. It throws JSP processing error
stating java methods used default sorl jsps cannot be resolved.

Have referred the following forum as well..
http://wiki.apache.org/solr/SolrWebSphere

Below mentioned xml files are already shipped with the Solr 3.6.2
WEB-INF/ibm-web-bnd.xmi
WEB-INF/ibm-web-ext.xmi

Can any one please suggest the procedure to setup Solr in Websphere
application server ?

-- 
By
M.Thirukumaran


Re: Tokenization at query time

2013-08-13 Thread Andrea Gazzarini

Hi Erick,
sorry if that wasn't clear: this is what I'm actually observing in my 
application.


I wrote the first post after looking at the explain (debugQuery=true): 
the query


q=mag 778 G 69

is translated as follow:

/  +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1)
  DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1)
  DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1)
  DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4)
  DisjunctionMaxQuery((//myfield://*mag778g69*//^3.0)~0.1)/

It seems that althouhg I declare myfield with this type

/








/SOLR is tokenizing it therefore by producing several tokens (mag,778,g,69)/
/

And I can't put double quotes on the query (q="mag 778 G 69") because 
the request handler searches also in other fields (with different 
configuration chains)


As I understood the query parser, (i.e. query time), does a whitespace 
tokenization on its own before invoking my (query-time) chain. The same 
doesn't happen at index time...this is my problem...because at index 
time the field is analyzed exactly as I want...but unfortunately cannot 
say the same at query time.


Sorry for my wonderful english, did you get the point?

On 08/13/2013 02:18 PM, Erick Erickson wrote:

On a quick scan I don't see a problem here. Attach
&debug=query to your url and that'll show you the
parsed query, which will in turn show you what's been
pushed through the analysis chain you've defined.

You haven't stated whether you've tried this and it's
not working or you're looking for guidance as to how
to accomplish this so it's a little unclear how to
respond.

BTW, the admin/analysis page is your friend here

Best
Erick


On Mon, Aug 12, 2013 at 12:52 PM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:


Clear, thanks for response.

So, if I have two fields


 
 

 
 
 


 
 
 
 
 
 


(first field type *Mag. 78 D 99* becomes *mag78d99* while second field
type ends with several tokens)

And I want to use the same request handler to query against both of them.
I mean I want the user search something like

http///search?q=Mag 78 D 99

and this search should search within both the first (with type1) and
second (with type 2) by matching

- a document which has field_with_type1 equals to *mag78d99* or
- a document which has field_with_type2 that contains a text like "go to
*mag 78*, class *d* and subclass *99*)



 ...
 dismax
 ...
 100%
 
 field_with_type1
 field_with_type_2
 
 ...


is not possible? If so, is possible to do that in some other way?

Sorry for the long email and thanks again
Andrea


On 08/12/2013 04:01 PM, Jack Krupansky wrote:


Quoted phrases will be passed to the analyzer as one string, so there a
white space tokenizer is needed.

-- Jack Krupansky

-Original Message- From: Andrea Gazzarini
Sent: Monday, August 12, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Tokenization at query time

Hi Tanguy,
thanks for fast response. What you are saying corresponds perfectly with
the behaviour I'm observing.
Now, other than having a big problem (I have several other fields both
in the pf and qf where spaces doesn't matter, field types like the
"text_en" field type in the example schema) what I'm wondering is:

/"The query parser splits the input query on white spaces, and the each
token is analysed according to your configuration"//
/
Is there a valid reason to declare a WhiteSpaceTokenizer in a query
analyzer? If the input query is already parsed (i.e. whitespace
tokenized) what is its effect?

Thank you very much for the help
Andrea

On 08/12/2013 12:37 PM, Tanguy Moal wrote:


Hello Andrea,
I think you face a rather common issue involving keyword tokenization
and query parsing in Lucene:
The query parser splits the input query on white spaces, and then each
token is analysed according to your configuration.
So those queries with a whitespace won't behave as expected because each
token is analysed separately. Consequently, the catenated version of the
reference cannot be generated.
I think you could try surrounding your query with double quotes or
escaping the space characters in your query using a backslash so that the
whole sequence is analysed in the same analyser and the catenation occurs.
You should be aware that this approach has a drawback: you will probably
not be able to combine the search for Mag. 778 G 69 with other words in
other fields unless you are able to identify which spaces are to be escaped:
For example, if input the query is:
Awesome Mag. 778 G 69
you would want to transform it to:
Awesome Mag.\ 778\ G\ 69 // spaces are escaped in the reference only
or
Awesome "Mag. 778 G 69" // only the reference is turned into a phrase
query

Do you get the point?

Look at the differences between what you tried and the following
examples which should all do what yo

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
Well, i meant before, but i just took a look and this is implemented
differently than the "merge" one.

In any case, i think its the same bug, because I think the only way
this can happen is if somehow this splitter is trying to create a
0-document "split" (or maybe a split containing all deletions).

On Tue, Aug 13, 2013 at 8:22 AM, Srivatsan  wrote:
> Ya i am performing commit after split request is submitted to server.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Split-Shard-Error-maxValue-must-be-non-negative-tp4084220p4084256.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ping request uses wrong default search field?

2013-08-13 Thread Erick Erickson
1> The defaultSearchField in schema.xml is deprecated
2> the  parameter will _probably_ override it anyway.

Best
Erick


On Tue, Aug 13, 2013 at 4:16 AM, Bram Van Dam  wrote:

> Addendum: using Solr 4.3.1
>


Re: Setting hostPort in System properties

2013-08-13 Thread Erick Erickson
Unless this is a copy/paste error, it's just wrong 

-DjhostPort=8080

jhostPort?

But more to the point, your specification is wrong. The
sysprop that would be substituted is "port" since that's what's
in ${port:}.

Try:
${hostPort:}

Best
Erick


On Tue, Aug 13, 2013 at 3:47 AM, Prasi S  wrote:

> Hi,
> when i set solr hostPort in tomcat system properties, it is not working. If
> I specify that in solr.xml then it is working. Is it mandatory that
> hostPort shouls be set only in solr.xml ?
>
> Solr.xml setting:
>
> 
>
>   
> ${host:}
> *${port:}*
>
> Tomcat runtime setting:
> *
> *
> *set JAVA_OPTS=-Dprogram.name=%PROGNAME%
> -Dlogging.configuration=file:%DIRNAME%logging.properties -DhostContext=solr
> -Dhost=10.239.30.27 -DjhostPort=8080
> *
> *
> *
> Thanks,
> Prasi
>


Re: I have tried to use Solr 4.4 but some problems happened. need your help

2013-08-13 Thread Erick Erickson
It looks like you have older jar files in your classpath, evidenced
by the line:
Caused by: java.lang.ClassCastException: class
org.apache.solr.handler.dataimport.DataImportHandler

bq: Originally, there is not the lib folder under solr, so I created it for
adding several jar files.

This is really suspicious. Lots of Solr wouldn't work if there
were no lib directory, so I suspect you have jar files in two or more
places and that they're different versions, but that's just a guess.

Best
Erick


On Mon, Aug 12, 2013 at 9:07 PM, Rex  wrote:

> I am in stuck below problem a few days. If someone have experienced the
> same
> as this problems like me. Please give a hint. Thank you.
>
> Originally, there is not the lib folder under solr, so I created it for
> adding several jar files. ( I already used the ext folder which located in
> example/lib/ext. but it doesn't work)
>
> I have added some jars files under solr/lib, especially  ojdbc5.jar,
> solr-common-1.3.0.jar, solr-core-4.4.0.jar and
> solr-dataimporthandler-4.4.0.jar
>
>  In solrconfig.xml
> adding below
> 
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
> db-data-config.xml
> 
> 
>
>
>
> > in schema.xml
> adding below
>  bId
>
>   required="true" />
> required="true" />
>
>
>
>
>
>
> >under db-data-config.xml
>
> 
>  name="datasource-oracle"
> driver="oracle.jdbc.driver.OracleDriver"
> url="jdbc:oracle:thin:@xxx.xxx.xx.xx::"
> user=""
> password="X" />
>
>
>
>
> 
>  transformer="ClobTransformer, HTMLStripTransformer, script:BoostDoc"
> query="
> SELECT
> B_ID,
> BI_ID,
> SUBJECT,
> CONTENT,
> CRE_DT,
> FILE_NAME,
> FILE_SIZE,
> FILE_SYS_DIR,
> FILE_SYS_NAME,
>
> FROM XX
> "
> >
> 
> 
> 
> 
> 
> 
> 
> 
> 
>
> 
> 
>
> 
>
> =  E R R O R === M E S S A G E
> 
>
> 3645 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer
> ? Unable to create core: collection1
> org.apache.solr.common.SolrException: RequestHandler init failure
> at org.apache.solr.core.SolrCore.(SolrCore.java:835)
> at org.apache.solr.core.SolrCore.(SolrCore.java:629)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
> at
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
> at
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: org.apache.solr.common.SolrException: RequestHandler init
> failure
> at
>
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:167)
> at org.apache.solr.core.SolrCore.(SolrCore.java:772)
> ... 13 more
> Caused by: org.apache.solr.common.SolrException: Error Instantiating
> Request
> Handler, org.apache.solr.handler.dataimport.DataImportHandler failed to
> instantiate org.apache.solr.request.SolrRequestHandler
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
> at
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:603)
> at
>
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:153)
> ... 14 more
> Caused by: java.lang.ClassCastException: class
> org.apache.solr.handler.dataimport.DataImportHandler
> at java.lang.Class.asSubclass(Class.java:3116)
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:433)
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:381)
> a

Re: SOLR OR query, want 1 of the 2 results

2013-08-13 Thread Erick Erickson
You can probably make this pretty fast by doing a fq with bbox
to restrict the number of documents that needed their distance
calculated

Best
Erick


On Mon, Aug 12, 2013 at 9:13 AM, Raymond Wiker  wrote:

> It will probably have better performance than having a "plan b" query that
> executes if the first query fails...
>
>
> On Mon, Aug 12, 2013 at 2:27 PM, PoM  wrote:
>
> > That would actually be a decent solution, although it isn't the best i
> will
> > try if it gives any performance issues
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/SOLR-OR-query-want-1-of-the-2-results-tp4083957p4083969.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: very simple boolean query not working

2013-08-13 Thread Erick Erickson
Not quite sure what you're seeing here. adding &debugQuery=true
shows the parsed query, timings, things like that. What it does NOT
do is show you how things were scored.

If you have a document that you think should match, you can add
explainOther to the query and it'll show you how an arbitrary
document would be scored, see:
http://wiki.apache.org/solr/CommonQueryParameters#explainOther

Best
Erick


On Mon, Aug 12, 2013 at 1:50 PM, S L  wrote:

> Jack Krupansky-2 wrote
> > Also, be aware that the spaces in your query need to be URL-encoded.
> > Depending on how you are sending the command, you may have to do that
> > encoding yourself.
> >
> > -- Jack Krupansky
>
> It's a good possibility that that's the problem. I've been doing queries in
> different ways (browser, curl, input from other programs...) and haven't
> thought lately about the encoding of spaces. I won't get a chance to check
> the encoding until tonight but I'll report back later.
>
> Thanks very much.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/very-simple-boolean-query-not-working-tp4083895p4084054.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Srivatsan
Ya i am performing commit after split request is submitted to server.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Split-Shard-Error-maxValue-must-be-non-negative-tp4084220p4084256.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tokenization at query time

2013-08-13 Thread Erick Erickson
On a quick scan I don't see a problem here. Attach
&debug=query to your url and that'll show you the
parsed query, which will in turn show you what's been
pushed through the analysis chain you've defined.

You haven't stated whether you've tried this and it's
not working or you're looking for guidance as to how
to accomplish this so it's a little unclear how to
respond.

BTW, the admin/analysis page is your friend here

Best
Erick


On Mon, Aug 12, 2013 at 12:52 PM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

> Clear, thanks for response.
>
> So, if I have two fields
>
> 
> 
> 
>
> 
>  generateWordParts="0" generateNumberParts="0"
> catenateWords="0" catenateNumbers="0" catenateAll="1"
> splitOnCaseChange="0" />
> 
> 
> 
> 
>  mapping="mapping-FoldToASCII.**txt"/>
> 
> 
> 
> 
> 
>
> (first field type *Mag. 78 D 99* becomes *mag78d99* while second field
> type ends with several tokens)
>
> And I want to use the same request handler to query against both of them.
> I mean I want the user search something like
>
> http///search?q=Mag 78 D 99
>
> and this search should search within both the first (with type1) and
> second (with type 2) by matching
>
> - a document which has field_with_type1 equals to *mag78d99* or
> - a document which has field_with_type2 that contains a text like "go to
> *mag 78*, class *d* and subclass *99*)
>
>
> 
> ...
> dismax
> ...
> 100%
> 
> field_with_type1
> field_with_type_2
> 
> ...
> 
>
> is not possible? If so, is possible to do that in some other way?
>
> Sorry for the long email and thanks again
> Andrea
>
>
> On 08/12/2013 04:01 PM, Jack Krupansky wrote:
>
>> Quoted phrases will be passed to the analyzer as one string, so there a
>> white space tokenizer is needed.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Andrea Gazzarini
>> Sent: Monday, August 12, 2013 6:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Tokenization at query time
>>
>> Hi Tanguy,
>> thanks for fast response. What you are saying corresponds perfectly with
>> the behaviour I'm observing.
>> Now, other than having a big problem (I have several other fields both
>> in the pf and qf where spaces doesn't matter, field types like the
>> "text_en" field type in the example schema) what I'm wondering is:
>>
>> /"The query parser splits the input query on white spaces, and the each
>> token is analysed according to your configuration"//
>> /
>> Is there a valid reason to declare a WhiteSpaceTokenizer in a query
>> analyzer? If the input query is already parsed (i.e. whitespace
>> tokenized) what is its effect?
>>
>> Thank you very much for the help
>> Andrea
>>
>> On 08/12/2013 12:37 PM, Tanguy Moal wrote:
>>
>>> Hello Andrea,
>>> I think you face a rather common issue involving keyword tokenization
>>> and query parsing in Lucene:
>>> The query parser splits the input query on white spaces, and then each
>>> token is analysed according to your configuration.
>>> So those queries with a whitespace won't behave as expected because each
>>> token is analysed separately. Consequently, the catenated version of the
>>> reference cannot be generated.
>>> I think you could try surrounding your query with double quotes or
>>> escaping the space characters in your query using a backslash so that the
>>> whole sequence is analysed in the same analyser and the catenation occurs.
>>> You should be aware that this approach has a drawback: you will probably
>>> not be able to combine the search for Mag. 778 G 69 with other words in
>>> other fields unless you are able to identify which spaces are to be escaped:
>>> For example, if input the query is:
>>> Awesome Mag. 778 G 69
>>> you would want to transform it to:
>>> Awesome Mag.\ 778\ G\ 69 // spaces are escaped in the reference only
>>> or
>>> Awesome "Mag. 778 G 69" // only the reference is turned into a phrase
>>> query
>>>
>>> Do you get the point?
>>>
>>> Look at the differences between what you tried and the following
>>> examples which should all do what you want:
>>> http://localhost:8983/solr/**collection1/select?q=%22Mag.%**
>>> 20778%20G%2069%22&debugQuery=**on&qf=text%20myfield&defType=**dismax
>>> OR
>>> http://localhost:8983/solr/**collection1/select?q=myfield:**Mag
>>> .\%20778\%20G\%2069&**debugQuery=on
>>> OR
>>> http://localhost:8983/solr/**collection1/select?q=Mag
>>> .\%**20778\%20G\%2069&debugQuery=**on&qf=text%20myfield&defType=**edismax
>>>
>>>
>>> I hope this helps
>>>
>>> Tanguy
>>>
>>> On Aug 12, 2013, at 11:13 AM, Andrea Gazzarini <
>>> andrea.gazzar...@gmail.com> wrote:
>>>
>>>  Hi all,
 I have a field (among others)in my schema defined like 

Re: Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Robert Muir
did you do a (real) commit before trying to use this?
I am not sure how this splitting works, but at least the merge option
requires that.

i can't see this happening unless you are somehow splitting a 0
document index (or, if the splitter is creating 0 document splits)
so this is likely just a symptom of
https://issues.apache.org/jira/browse/LUCENE-5116

On Tue, Aug 13, 2013 at 6:46 AM, Srivatsan  wrote:
> Hi,
>
> I am experimenting with solr 4.4.0 split shard feature. When i split the
> shard i am getting the following exception.
>
> /java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1)
> at
> org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1184)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:140)
> at
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
> at
> org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
> at 
> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
> at 
> org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2488)
> at
> org.apache.solr.update.SolrIndexSplitter.split(SolrIndexSplitter.java:125)
> at
> org.apache.solr.update.DirectUpdateHandler2.split(DirectUpdateHandler2.java:766)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:284)
> at
> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at
> org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at 
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:679)/
>
>
> How to resolve this problem?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Split-Shard-Error-maxValue-must-be-non-negative-tp4084220.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr4 update and query performance question

2013-08-13 Thread Erick Erickson
1> That's hard-coded at present. There's anecdotal evidence that there
 are throughput improvements with larger batch sizes, but no action
 yet.
2> Yep, all searchers are also re-opened, caches re-warmed, etc.
3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
queries would help diagnose this. Also, did you try to copy/paste
the configuration from your Solr3 to Solr4? I'd start with the
Solr4 and copy/paste only the parts needed from your SOlr3 setup.

Best
Erick


On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital  wrote:

> Hi,
>
> We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
> with about 450 mil documents (~90 mil per shard). We're loading 1000 or
> less documents in CSV format every few minutes. In Solr3, with 300 mil
> documents, it used to take 30 seconds to load 1000 documents while in
> Solr4, its taking up to 3 minutes to load 1000 documents. We're using
> custom sharding, we include _shard_=shardid parameter in update command.
> Upon looking Solr4 log files we found that:
>
> 1.   Documents are added in a batch of 10 records. How do we increase
> this batch size from 10 to 1000 documents?
>
> 2.  We do hard commit after loading 1000 documents. For every hard
> commit, it refreshes searcher on all nodes. Are all caches also refreshed
> when hard commit happens? We're planning to change to soft commit and do
> auto hard commit every 10-15 minutes.
>
> 3.  We're not seeing improved query performance compared to Solr3.
> Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
> seconds with Solr4. We think this could be due to frequent hard commits and
> searcher refresh. Do you think when we change to soft commit and increase
> the batch size, we will see better query performance.
>
> Thanks!
>
>
>


Re: Extending fieldtypes

2013-08-13 Thread Erick Erickson
This has been mentioned before, but it's never been
implemented. It's a pain to copy/paste the full field
definition, but the utility of "subclassing" fieldTypes
is really pretty restricted. How, for instance, would
you, say, tweak the parameters to WordDelimiterFilterFactory
in your sub-field? And a rule like "you only add stuff to
the end of the chain" is pretty limited.

So copy/paste/edit I'm afraid.

Best
Erick


On Mon, Aug 12, 2013 at 10:19 AM, Bruno René Santos wrote:

> Hi,
>
> Example:
>
>
> I want that stringTweakedNoIDF would be a stringTweaked but with the extra
> similarity.
>
>  sortMissingLast="true" positionIncrementGap="100">
>  
> 
>  
> 
>   replacement=" "/>
> 
>  
> 
>  
> 
>  
> 
>   replacement=" "/>
> 
>  
>  ignoreCase="true" expand="true"/>
>  
> 
>  sortMissingLast="true" positionIncrementGap="100">
>  
> 
>  
> 
>  
>  replacement=" "/>
>  
> 
>  
> 
> 
>  
> 
>   replacement=" "/>
> 
>  
>  ignoreCase="true" expand="true"/>
>  
> 
>
> Regards
> Bruno
>
>
> On Mon, Aug 12, 2013 at 3:07 PM, tamanjit.bin...@yahoo.co.in <
> tamanjit.bin...@yahoo.co.in> wrote:
>
> > You would need to provide a Solr file that would be the basic field type
> > and
> > do rest of analysis on it. Is this what you want?
> >
> > eg. fieldType name="textSpellPhrase" class="solr.TextField"
> > positionIncrementGap="100" stored="false"
> > multiValued="true"> > class="solr.KeywordTokenizerFactory"/> > class="solr.LowerCaseFilterFactory"/>
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Extending-fieldtypes-tp4083986p4083992.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Bruno René Santos
> Lisboa - Portugal
>


Re: Solr 4.4 Cloud always indexing to only one shard

2013-08-13 Thread Erick Erickson
Again, why are you using ...admin/cores rather than admin/collections?

See:
http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

Best
Erick


On Tue, Aug 13, 2013 at 5:00 AM, Prasi S  wrote:

> I create a collection prior to tomcat startup.
>
> -->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
> -zkhost localhost:2181 -confdir solr-conf -confname solrconf1
>
> -->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
> -zkhost 127.0.0.1:2181 -collection firstcollection -confname solrconf1
> -solrhome ../solr_instances/solr
>
> 1. Start Zookeeper server
> 2. Link the configuaration to the collection
> 3. Check those in ZooClient
> 4. Start tomcats
> 5. Create cores and assign to collections.
>
>
> http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1
>
> Are these ok or am I making a mistake?
>
>
> On Mon, Aug 12, 2013 at 6:49 PM, Erick Erickson  >wrote:
>
> > Why are you using the core creation commands rather than the
> > collection commands? The latter are intended for SolrCloud...
> >
> > Best
> > Erick
> >
> >
> > On Mon, Aug 12, 2013 at 4:51 AM, Prasi S  wrote:
> >
> > > Hi,
> > > I have setup solrcloud in solr 4.4, with 2 solr's in 2 tomcat servers
> and
> > > Zookeeper.
> > >
> > > I setup Zookeeper with a collection "firstcollection" and then i give
> the
> > > belwo command
> > >
> > >
> > >
> >
> http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1
> > >
> > > Similarly, i create 4 shards. 2 shard in the first instance and two
> > shards
> > > in the second instance.
> > >
> > > When i index files to
> > >
> >
> http://localhost:8080/solr/firstcollection/dataimport?command=full-import,
> > > the data always gets indexed to the shard1.
> > >
> > > There are no documents in shard2, 3 ,4. I checked this with
> > >
> > > http://localhost:8080/solr/firstcollection/select?q=*:*&fl=[shard]
> > >
> > > But searching across any of the two gives full results. It this a
> problem
> > > with 4.4 version.
> > >
> > > Similar scenario , i have tested in solr 4.0 and itr was working fine.
> > >
> > > Pls help.
> > >
> >
>


Re: SolrCloud - Replica 'down'. How to get it back as 'active'? - Solr 4.3.0

2013-08-13 Thread Erick Erickson
Nobody knows without considerably more details, please review:

http://wiki.apache.org/solr/UsingMailingLists

What do the Solr logs say on that node? You could consider just
blowing away the index on that core and letting it replicate in full
from the leader.

Best
Erick


On Mon, Aug 12, 2013 at 10:28 AM, Jeroen Steggink wrote:

> After two weeks it's still down. What could be the problem?
>
>
> On 31-7-2013 16:40, Anshum Gupta wrote:
>
>> It perhaps is just replaying the transaction logs and coming up. Wait for
>> it is what I'd say.
>> The admin UI as of now doesn't show replaying of transaction log as
>> 'recovering', it does so only during peer sync.
>>
>> Also, you may want to add autoSoftCommit and increase the autoCommit to a
>> few minutes.
>>
>>
>> On Wed, Jul 31, 2013 at 7:55 PM, Jeroen Steggink > >wrote:
>>
>>  Hi,
>>>
>>> After the following error, one of the replicas of the leader went down.
>>> "Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try
>>> again later."
>>> I increased the autoCommit time to 5000ms and restarted Solr.
>>>
>>> However, the status is still set to "down".
>>> How do I get it back to "active"?
>>>
>>> Regards,
>>> Jeroen
>>>
>>>
>>>
>>>
>>
>
>


Re: Collection - loadOnStartup

2013-08-13 Thread Erick Erickson
Frankly, you're into somewhat uncharted waters, the whole lazy core
capability was designed for non-cloud mode.

Cores are initialized when the first request comes in that addresses
the core. Whether ZK and SolrCloud know a core is active before
the first time it's loaded I don't know.

I think you'll find that combining lazy core loading with SolrCloud will
have more issues, you've found two so far and no doubt there
are more lurking. If you'd like to help make SolrCloud work with
transient cores, patches welcome!

I won't have time/motivation to work on this for the foreseeable future,
I'm not even sure it _should_ be supported. So if all you need to
do is wait for a while, you have a work-around. Otherwise, I'm clueless.

Best
Erick


On Mon, Aug 12, 2013 at 8:54 AM, Srivatsan wrote:

> Hi
> I manually edited core.properties file by setting /loadOnStartup=false/ on
> all cores. But here i am facing a problem. After starting the solr cloud
> servers, I couldnt index to any collection till particular interval. I am
> getting exception like "No Live Solr Servers".
>
> => If we submit a index request why does the cores didnt initialize at run
> time?
>
> => How the cores will be initialized if loadOnStartup=false?
>
>
> Thanks
>
> Srivatsan
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Collection-loadOnStartup-tp4082531p4083972.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


replication issue on slaves

2013-08-13 Thread Michael Tsadikov
Hi

We're using good old master-slave replication (not SolrCloud yet)

Since we've upgraded to solr 4.3, replication causes the slave JVM to
hiccup (probably heap & GC issues) *during the download phase*, which
seemed strange to me because download was only supposed to copy files from
the master to a local directory and not do anything else... (I would
understand if this had happened during reopening of index readers...)

I noticed in the log below that it uses NRTCachingDirectory, and it got me
thinking that maybe the slave is misconfigured since we don't need NRT at
all.

Aug 13, 2013 5:55:47 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting download to *NRTCachingDirectory*
(org.apache.lucene.store.MMapDirectory@/srv/solr/shard-3-0/data/index.20130813055547333
lockFactory=org.apache.lucene.store.NativeFSLockFactory@35463dc3;
maxCacheMB=48.0 maxMergeSizeMB=4.0) fullCopy=false
Aug 13, 2013 5:59:22 AM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Total time taken for download : 214 secs

Our solrconfig contains the following on both master and slaves, but I
thought it is irrelevant to slaves:


  ${solr.ulog.dir:}


 
   360
   true
 

How can I make replication behave in slaves like it did in pre-4.0 versions?

Thanks,
Michael


Split Shard Error - maxValue must be non-negative

2013-08-13 Thread Srivatsan
Hi,

I am experimenting with solr 4.4.0 split shard feature. When i split the
shard i am getting the following exception.

/java.lang.IllegalArgumentException: maxValue must be non-negative (got: -1)
at
org.apache.lucene.util.packed.PackedInts.bitsRequired(PackedInts.java:1184)
at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:140)
at
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
at
org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
at 
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2488)
at
org.apache.solr.update.SolrIndexSplitter.split(SolrIndexSplitter.java:125)
at
org.apache.solr.update.DirectUpdateHandler2.split(DirectUpdateHandler2.java:766)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:284)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:186)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:209)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:679)/


How to resolve this problem? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Split-Shard-Error-maxValue-must-be-non-negative-tp4084220.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud: Programmatically create multiple collections?

2013-08-13 Thread xinwu
HI,Mark.
When I managed collections via the Collections API.
How can I set the 'instanceDir' name?
eg:http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4
 
My instanceDir is 'mycollection_shard2_replica1'.
How can I change it to 'mycollection'?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Programmatically-create-multiple-collections-tp3916927p4084202.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.4 Cloud always indexing to only one shard

2013-08-13 Thread Prasi S
I create a collection prior to tomcat startup.

-->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig
-zkhost localhost:2181 -confdir solr-conf -confname solrconf1

-->java -classpath .;zoo-lib/* org.apache.solr.cloud.ZkCLI -cmd linkconfig
-zkhost 127.0.0.1:2181 -collection firstcollection -confname solrconf1
-solrhome ../solr_instances/solr

1. Start Zookeeper server
2. Link the configuaration to the collection
3. Check those in ZooClient
4. Start tomcats
5. Create cores and assign to collections.

http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1

Are these ok or am I making a mistake?


On Mon, Aug 12, 2013 at 6:49 PM, Erick Erickson wrote:

> Why are you using the core creation commands rather than the
> collection commands? The latter are intended for SolrCloud...
>
> Best
> Erick
>
>
> On Mon, Aug 12, 2013 at 4:51 AM, Prasi S  wrote:
>
> > Hi,
> > I have setup solrcloud in solr 4.4, with 2 solr's in 2 tomcat servers and
> > Zookeeper.
> >
> > I setup Zookeeper with a collection "firstcollection" and then i give the
> > belwo command
> >
> >
> >
> http://localhost:8080/solr/admin/cores?action=CREATE&name=mycore_sh1&collection=firstcollection&shard=shard1
> >
> > Similarly, i create 4 shards. 2 shard in the first instance and two
> shards
> > in the second instance.
> >
> > When i index files to
> >
> http://localhost:8080/solr/firstcollection/dataimport?command=full-import,
> > the data always gets indexed to the shard1.
> >
> > There are no documents in shard2, 3 ,4. I checked this with
> >
> > http://localhost:8080/solr/firstcollection/select?q=*:*&fl=[shard]
> >
> > But searching across any of the two gives full results. It this a problem
> > with 4.4 version.
> >
> > Similar scenario , i have tested in solr 4.0 and itr was working fine.
> >
> > Pls help.
> >
>


Re: Ping request uses wrong default search field?

2013-08-13 Thread Bram Van Dam

Addendum: using Solr 4.3.1


Ping request uses wrong default search field?

2013-08-13 Thread Bram Van Dam

Hi folks,

Not sure if this is a bug or intended behaviour, but the ping query 
seems to rely on the value of the default "df" value in the 
requestHandler, rather than on the core's defaultSearchField defined in 
schema.xml.



I would expect the schema.xml values to override solrconfig.xml values. 
Is this intentional? If it's a bug I wouldn't mind having a crack at 
fixing it.



Thanks,

 - Bram


*schema.xml snippet:*

foobar

*solrconfig.xml snippet:*


  
text
...





*Stack Trace:*

2013-08-13/09:49:28.687/CEST|ERROR|http-apr-8080-exec-4|org.apache.solr.core.SolrCore.log:108|org.apache.solr.common.SolrException: 
undefined field text
at 
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1225)
at 
org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:425)
at 
org.apache.lucene.analysis.AnalyzerWrapper.initReader(AnalyzerWrapper.java:81)

at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:132)
at 
org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:408)
at 
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:966)
at 
org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:765)

at org.apache.solr.parser.QueryParser.Term(QueryParser.java:300)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:186)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:108)
at 
org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:97)
at 
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:160)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:72)

at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:117)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:253)
at 
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:210)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)

2013-08-13/09:49:28.689/CEST|ERROR|http-apr-8080-exec-4|org.apache.solr.core.SolrCore.log:108|org.apache.solr.common.SolrException: 
Ping query caused exception: undefined field text
at 
org.apache.solr.handler.PingRequestHandler.handlePing(PingRequestHandler.java:262)
at 
org.apache.solr.handler.PingRequestHandler.handleRequestBody(PingRequestHandler.java:210)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.do

Setting hostPort in System properties

2013-08-13 Thread Prasi S
Hi,
when i set solr hostPort in tomcat system properties, it is not working. If
I specify that in solr.xml then it is working. Is it mandatory that
hostPort shouls be set only in solr.xml ?

Solr.xml setting:



  
${host:}
*${port:}*

Tomcat runtime setting:
*
*
*set JAVA_OPTS=-Dprogram.name=%PROGNAME%
-Dlogging.configuration=file:%DIRNAME%logging.properties -DhostContext=solr
-Dhost=10.239.30.27 -DjhostPort=8080
*
*
*
Thanks,
Prasi


Re: developing custom tokenizer

2013-08-13 Thread dhaivat dave
Hi Alex,

Thanks for your reply and i looked into core analyser and also created
custom tokeniser using that.I have shared code below. when i tried to look
into analysis of solr, the analyser is working fine but when i tried to
submit 100 docs together i found in logs (with custom message printing)
 that for some of the document it's not calling "create" method from
SampleTokeniserFactory (please see code below).

can you please help me out what's wrong in following code. am i missing
something?

here is the class which extends TokeniserFactory class

=== SampleTokeniserFactory.java

public class SampleTokeniserFactory extends TokenizerFactory {

public SampleTokeniserFactory(Map args) {
super(args);
}

public SampleTokeniser create(AttributeFactory factory, Reader reader) {
return new SampleTokeniser(factory, reader);
}

}

here is the class which extends Tokenizer class


package ns.solr.analyser;

import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;

import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import
org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;

public class SampleTokeniser extends Tokenizer {

private List tokenList = new ArrayList();

int tokenCounter = -1;

private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);

/**
 * Object that defines the offset attribute
 */
private final OffsetAttribute offsetAttribute = (OffsetAttribute)
addAttribute(OffsetAttribute.class);

/**
 * Object that defines the position attribute
 */
private final PositionIncrementAttribute position =
(PositionIncrementAttribute) addAttribute(PositionIncrementAttribute.class);

public SampleTokeniser(AttributeFactory factory, Reader reader) {
super(factory, reader);
String textToProcess = null;
try {
textToProcess = readFully(reader);
processText(textToProcess);
} catch (IOException e) {
e.printStackTrace();
}

}

public String readFully(Reader reader) throws IOException {
char[] arr = new char[8 * 1024]; // 8K at a time
StringBuffer buf = new StringBuffer();
int numChars;
while ((numChars = reader.read(arr, 0, arr.length)) > 0) {
buf.append(arr, 0, numChars);
}
return buf.toString();
}

public void processText(String textToProcess) {

String wordsList[] = textToProcess.split(" ");

int startOffset = 0, endOffset = 0;

for (String word : wordsList) {

endOffset = word.length();

Token aToken = new Token("Token." + word, startOffset, endOffset);

aToken.setPositionIncrement(1);

tokenList.add(aToken);

startOffset = endOffset + 1;
}
}

@Override
public boolean incrementToken() throws IOException {

clearAttributes();
tokenCounter++;

if (tokenCounter < tokenList.size()) {
Token aToken = tokenList.get(tokenCounter);

termAtt.append(aToken);
termAtt.setLength(aToken.length());
offsetAttribute.setOffset(correctOffset(aToken.startOffset()),
correctOffset(aToken.endOffset()));
position.setPositionIncrement(aToken.getPositionIncrement());
return true;
}

return false;
}

/**
 * close object
 *
 * @throws IOException
 */
public void close() throws IOException {
super.close();
System.out.println("Close method called");

}

/**
 * called when end method gets called
 *
 * @throws IOException
 */
public void end() throws IOException {
super.end();
// setting final offset
System.out.println("end called with final offset");
}

/**
 * method reset the record
 *
 * @throws IOException
 */
public void reset() throws IOException {
super.reset();
System.out.println("Reset Called");
tokenCounter = -1;

}
}


Many Thanks,
Dhaivat


On Mon, Aug 12, 2013 at 7:03 PM, Alexandre Rafalovitch
wrote:

> Have you tried looking at source code itself? Between simple organizer like
> keyword and complex language ones, you should be able to get an idea. Then
> ask specific follow up questions.
>
> Regards,
>  Alex
> On 12 Aug 2013 09:29, "dhaivat dave"  wrote:
>
> > Hello All,
> >
> > I want to create custom tokeniser in solr 4.4.  it will be very helpful
> if
> > some one share any tutorials or information on this.
> >
> >
> > Many Thanks,
> > Dhaivat Dave
> >
>



-- 







Regards
Dhaivat


Re: Measuring SOLR performance

2013-08-13 Thread Dmitry Kan
Hi Roman,

Something bad happened in fresh checkout:

python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60 -R
cms -t /solr/statements -e statements -U 100

Traceback (most recent call last):
  File "solrjmeter.py", line 1392, in 
main(sys.argv)
  File "solrjmeter.py", line 1347, in main
save_into_file('before-test.json', simplejson.dumps(before_test))
  File "/usr/lib/python2.7/dist-packages/simplejson/__init__.py", line 286,
in dumps
return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 226,
in encode
chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 296,
in iterencode
return _iterencode(o, 0)
  File "/usr/lib/python2.7/dist-packages/simplejson/encoder.py", line 202,
in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <__main__.ForgivingValue object at 0x7fc6d4040fd0> is not JSON
serializable


Regards,

D.


On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla  wrote:

> Hi Dmitry,
>
>
>
> On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan  wrote:
>
> > Hi Roman,
> >
> > Good point. I managed to run the command with -C and double quotes:
> >
> > python solrjmeter.py -a -C "g1,cms" -c hour -x ./jmx/SolrQueryTest.jmx
> >
> > As a result got several files (html, css, js, csv) in the running
> directory
> > (any way to specify where the output should be stored in this case?)
> >
>
> i know it is confusing, i plan to change it - but later, now it is too busy
> here...
>
>
> >
> > When I look onto the comparison dashboard, I see this:
> >
> > http://pbrd.co/17IRI0b
> >
>
> two things: the tests probably took more than one hour to finish, so they
> are not aligned - try generating the comparison with '-c  14400'  (ie.
> 4x3600 secs)
>
> the other thing: if you have only two datapoints, the dygraph will not show
> anything - there must be more datapoints/measurements
>
>
>
> >
> > One more thing: all the previous tests were run with softCommit disabled.
> > After enabling it, the tests started to fail:
> >
> > $ python solrjmeter.py -a -x ./jmx/SolrQueryTest.jmx -q
> > ./queries/demo/demo.queries -s localhost -p 8983 -a --durationInSecs 60
> -R
> > g1 -t /solr/statements -e statements -U 100
> > $ cd g1
> > Reading results of the previous test
> > $ cd 2013.08.12.16.32.48
> > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
> > $ mkdir 2013.08.12.16.33.02
> > $ cd 2013.08.12.16.33.02
> > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter/g1
> > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
> > $ cd /home/dmitry/projects/lab/solrjmeter4/solrjmeter
> > Traceback (most recent call last):
> >   File "solrjmeter.py", line 1427, in 
> > main(sys.argv)
> >   File "solrjmeter.py", line 1381, in main
> > before_test = harvest_details_about_montysolr(options)
> >   File "solrjmeter.py", line 562, in harvest_details_about_montysolr
> > indexLstModified = cores_data['status'][cn]['index']['lastModified'],
> > KeyError: 'lastModified'
> >
>
> Thanks for letting me know, that info is probably not available in this
> situation - i've cooked st quick to fix it, please try the latest commit
> (hope it doesn't do more harm, i should get some sleep ..;))
>
> roman
>
>
> >
> > In case it matters:  Python 2.7.3, ubuntu, solr 4.3.1.
> >
> > Thanks,
> >
> > Dmitry
> >
> >
> > On Thu, Aug 8, 2013 at 2:22 AM, Roman Chyla 
> wrote:
> >
> > > Hi Dmitry,
> > > The command seems good. Are you sure your shell is not doing something
> > > funny with the params? You could try:
> > >
> > > python solrjmeter.py -C "g1,foo" -c hour -x ./jmx/SolrQueryTest.jmx -a
> > >
> > > where g1 and foo are results of the individual runs, ie. something that
> > was
> > > started and saved with '-R g1' and '-R foo' respectively
> > >
> > > so, for example, i have these comparisons inside
> > > '/var/lib/montysolr/different-java-settings/solrjmeter', so I am
> > generating
> > > the comparison by:
> > >
> > > export
> > > SOLRJMETER_HOME=/var/lib/montysolr/different-java-settings/solrjmeter
> > > python solrjmeter.py -C "g1,foo" -c hour -x ./jmx/SolrQueryTest.jmx -a
> > >
> > >
> > > roman
> > >
> > >
> > > On Wed, Aug 7, 2013 at 10:03 AM, Dmitry Kan 
> > wrote:
> > >
> > > > Hi Roman,
> > > >
> > > > One more question. I tried to compare different runs (g1 vs cms)
> using
> > > the
> > > > command below, but get an error. Should I attach some other param(s)?
> > > >
> > > >
> > > > python solrjmeter.py -C g1,foo -c hour -x ./jmx/SolrQueryTest.jmx
> > > > **ERROR**
> > > >   File "solrjmeter.py", line 1427, in 
> > > > main(sys.argv)
> > > >   File "solrjmeter.py", line 1303, in main
> > > > check_options(options, args)
> > > >   File "solrjmeter.py", line 185, in check_options
> > > > error("The folder '%s' does not exist" % rf)
> > > >   File "solrjmeter.py", line 66, in error
> > > > traceback.

Re: Shard splitting failure, with and without composite hashing

2013-08-13 Thread Srivatsan
I am also getting same error when performing shard splitting using solr 4.4.0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Shard-splitting-failure-with-and-without-composite-hashing-tp4083662p4084177.html
Sent from the Solr - User mailing list archive at Nabble.com.