RE: Facet sorting seems weird

2013-07-16 Thread Henrik Ossipoff Hansen
This is indeed an interesting idea so to speak, but I think it's a bit too 
manual, so to speak, for our use case. I do see it would solve the problem 
though, so thank you for sharing it with the community! :)
 
-Original Message-
From: James Thomas [mailto:jtho...@camstar.com] 
Sent: 15. juli 2013 17:08
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

We did something related to this that I'll share.  I'm rather new to Solr so 
take this idea cautiously :-) Our requirement was to show exact values but have 
case-insensitive sorting and facet filtering (prefix filtering).

We created an index field (type=string) for creating facets so that the 
values are indexed as-is.
The values we indexed were given the format lowercase value|exact value So 
for example, given the value bObles, we would index the string 
bobles|bObles.
When displaying the facet we split the facet value from Solr in half and 
display the second half to the user.
Of course the caveat is that you could have 2 facets that differ only in case, 
but to me that's a data cleansing issue.

James

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com]
Sent: Monday, July 15, 2013 10:57 AM
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hello, thank you for the quick reply!

But given that facet.sort=index just sorts by the faceted index (and I don't 
want the facet itself to be in lower-case), would that really work?

Regards,
Henrik Ossipoff


-Original Message-
From: David Quarterman [mailto:da...@corexe.com]
Sent: 15. juli 2013 16:46
To: solr-user@lucene.apache.org
Subject: RE: Facet sorting seems weird

Hi Henrik,

Try setting up a copyfield in your schema and set the copied field to use 
something like 'text_ws' which implements LowerCaseFilterFactory. Then sort on 
the copyfield.

Regards,

DQ

-Original Message-
From: Henrik Ossipoff Hansen [mailto:h...@entertainment-trading.com]
Sent: 15 July 2013 15:08
To: solr-user@lucene.apache.org
Subject: Facet sorting seems weird

Hello, first time writing to the list. I am a developer for a company where we 
recently switched all of our search core from Sphinx to Solr with very great 
results. In general we've been very happy with the switch, and everything seems 
to work just as we want it to.

Today however we've run into a bit of a issue regarding faceted sort.

For example we have a field called brand in our core, defined as the text_en 
datatype from the example Solr core. This field is copied into facet_brand with 
the datatype string (since we don't really need to do much with it except show 
it for faceted navigation).

Now, given these two entries into the field on different documents, LEGO and 
bObles, and given facet.sort=index, it appears that LEGO is sorted as being 
before bObles. I assume this is because of casing differences.

My question then is, how do we define a decent datatype in our schema, where 
the casing is exact, but we are able to sort it without casing mattering?

Thank you :)

Best regards,
Henrik Ossipoff


RE: Facet sorting seems weird

2013-07-16 Thread Henrik Ossipoff Hansen
Hi Alex,

Yes this makes sense. My Java is a bit dusty, but depending on how much in need 
we will become at this feature, it's definitely something we will look into 
creating, and if successful, we will definitely be submitting a patch. Thank 
you for your time and detailed answer!

Best regards,
Henrik Ossipoff

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: 15. juli 2013 17:16
To: solr-user@lucene.apache.org
Subject: Re: Facet sorting seems weird

Hi Henrik,

If I understand the question correctly (case-insensitive sorting of the facet 
values), then this is the limitation of the current Facet component.

You can see the full implementation at:
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java#L818

If you are comfortable with Java code, the easiest thing might be to copy/fix 
the component and use your own one for faceting. The components are defined in 
solrconfig.xml and FacetComponent is in a default chain.
See:
https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/solrconfig.xml#L1194

If you do manage to do this (I would recommend doing it as an extra option), it 
would be nice to have it contributed back to Solr. I think you are not the only 
one with this requirement.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jul 15, 2013 at 10:08 AM, Henrik Ossipoff Hansen  
h...@entertainment-trading.com wrote:

 Hello, first time writing to the list. I am a developer for a company 
 where we recently switched all of our search core from Sphinx to Solr 
 with very great results. In general we've been very happy with the 
 switch, and everything seems to work just as we want it to.

 Today however we've run into a bit of a issue regarding faceted sort.

 For example we have a field called brand in our core, defined as the 
 text_en datatype from the example Solr core. This field is copied into 
 facet_brand with the datatype string (since we don't really need to do 
 much with it except show it for faceted navigation).

 Now, given these two entries into the field on different documents, LEGO
 and bObles, and given facet.sort=index, it appears that LEGO is 
 sorted as being before bObles. I assume this is because of casing differences.

 My question then is, how do we define a decent datatype in our schema, 
 where the casing is exact, but we are able to sort it without casing 
 mattering?

 Thank you :)

 Best regards,
 Henrik Ossipoff



Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-16 Thread Marcin Rzewucki
Hi,

You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded
cores. This will unregister them from the zookeeper (cluster state will be
updated), so they won't be used for querying any longer. Solrcloud restart
is not needed in this case.

Regards.


On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote:

 Hello Luis,

 I don't think that is possible. If you delete clusterstate.json from
 zookeeper, you will need to restart the nodes.. I could be very wrong
 about this

 Saqib


 On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo 
 lcguerreroc...@gmail.com wrote:

  I know that you can clear zookeeper's data directoy using the CLI with
 the
  clear command, I just want to know if its possible to update the
 cluster's
  state without wiping everything out. Anyone have any ideas/suggestions?
 
 
  On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo 
  lcguerreroc...@gmail.com wrote:
 
   Hi,
  
   Is there an easy way to clear zookeeper of all offline solr nodes
 without
   restarting the cluster? We are having some stability issues and we
 think
  it
   maybe due to the leader querying old offline nodes.
  
   thank you,
  
   Luis Guerrero
  
 
 
 
  --
  Luis Carlos Guerrero Covo
  M.S. Computer Engineering
  (57) 3183542047
 



select in clause in solr

2013-07-16 Thread smanad
I am using solr 4.3 and have 2 collections coll1, coll2.

After searching in coll1 I get field1 values which is a comma separated list
of strings like, val1, val2, val3,... valN.
How can I use that list to match field2 in coll2 with those values separated
by an OR clause.
So i want to return all documents in coll2 with field2=val1 or field2=val2
or field2=val3 ... or field2=valN

In short looking for select in  type clause in solr. 

Any pointers will be much appreciated. 
-Manasi
 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: select in clause in solr

2013-07-16 Thread Oleg Burlaca
Hello Manasi,
Have a look at Solr pseudo joins http://wiki.apache.org/solr/Join

Regards
On Jul 16, 2013 9:54 AM, smanad sma...@gmail.com wrote:

 I am using solr 4.3 and have 2 collections coll1, coll2.

 After searching in coll1 I get field1 values which is a comma separated
 list
 of strings like, val1, val2, val3,... valN.
 How can I use that list to match field2 in coll2 with those values
 separated
 by an OR clause.
 So i want to return all documents in coll2 with field2=val1 or field2=val2
 or field2=val3 ... or field2=valN

 In short looking for select in  type clause in solr.

 Any pointers will be much appreciated.
 -Manasi






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Range query on a substring.

2013-07-16 Thread Marcin Rzewucki
Hi,

I have a problem (wonder if it is possible to solve it at all) with the
following query. There are documents with a field which contains a text and
a number in brackets, eg.

myfield: this is a text (number)

There might be some other documents with the same text but different number
in brackets.
I'd like to find documents with the given text say this is a text and
number between A and B. Is it possible in Solr ? Any ideas ?

Kind regards.


Re: Range query on a substring.

2013-07-16 Thread Oleg Burlaca
IMHO the number(s) should be extracted and stored in separate columns in
SOLR at indexing time.

--
Oleg


On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi,

 I have a problem (wonder if it is possible to solve it at all) with the
 following query. There are documents with a field which contains a text and
 a number in brackets, eg.

 myfield: this is a text (number)

 There might be some other documents with the same text but different number
 in brackets.
 I'd like to find documents with the given text say this is a text and
 number between A and B. Is it possible in Solr ? Any ideas ?

 Kind regards.



Re: Range query on a substring.

2013-07-16 Thread Marcin Rzewucki
Hi Oleg,
It's a multivalued field and it won't be easier to query when I split this
field into text and numbers. I may get wrong results.

Regards.


On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote:

 IMHO the number(s) should be extracted and stored in separate columns in
 SOLR at indexing time.

 --
 Oleg


 On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:

  Hi,
 
  I have a problem (wonder if it is possible to solve it at all) with the
  following query. There are documents with a field which contains a text
 and
  a number in brackets, eg.
 
  myfield: this is a text (number)
 
  There might be some other documents with the same text but different
 number
  in brackets.
  I'd like to find documents with the given text say this is a text and
  number between A and B. Is it possible in Solr ? Any ideas ?
 
  Kind regards.
 



Re: Book contest idea - feedback requested

2013-07-16 Thread Andrea Lanzoni
Alex, I am a beginner and I find it a really good idea. A new forum 
dedicated to understanding the features rather the missings would allow 
newcomers to post questions avoiding to mess up with solr-user list 
where people are already expert practitioners and prefer to see more 
targeted topics.

Let us know follow-up.
Andrea


On Mon, Jul 15, 2013 at 8:11 PM, Alexandre Rafalovitch 
arafa...@gmail.comwrote:

Hello,

Packt Publishing has kindly agreed to let me run a contest with e-copies of
my book as prizes:
http://www.packtpub.com/apache-solr-for-indexing-data/book

Since my book is about learning Solr and targeted at beginners and early
intermediates, here is what I would like to do. I am asking for feedback on
whether people on the mailing list like the idea or have specific
objections to it.

1) The basic idea is to get Solr users and write and vote on what they find
hard with Solr, especially in understanding the features (as contrasted
with just missing ones).
2) I'll probably set it up as a User Voice forum, which has all the
mechanisms for suggesting and voting on ideas. With an easier interface
than Jira
3) The top N voted ideas will get the books as prizes and I will try to
fix/document/create JIRAs for those issues.
4) I am hoping to specifically reach out to the communities where Solr is a
component and where they don't necessarily hang out on our mailing list. I
am thinking SolrNet, Drupal, project Blacklight, Cloudera, CrafterCMS,
SiteCore, Typo3, SunSpot, Nutch. Obviously, anybody and everybody from this
list would be absolutely welcome to participate as well.

Yes? No? Suggestions?

Also, if you are maintainer of one of the products/services/libraries that
has Solr in it and want to reach out to your community yourself, I think it
would be a lot better than If I did it. Contact me directly and I will let
you know what template/FAQ I want you to include in the announcement
message when it is ready.

Thank you all in advance for the comments and suggestions.

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)





Re: Range query on a substring.

2013-07-16 Thread Oleg Burlaca
Ah, you mean something like this:
record:
Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
Id=11, text =  this is a text N1 (W), another text N2 (Q), third text (M)

and you need to search for: text N1 and X  B ?
How big is the core? the first thing that comes to my mind, again, at
indexing level,
split the text into pieces and index it in solr like this:

record_id | text  | value
10   | text N1 | X
10   | text N2 | Y
10   | text N3 | Z

does it help?



On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi Oleg,
 It's a multivalued field and it won't be easier to query when I split this
 field into text and numbers. I may get wrong results.

 Regards.


 On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote:

  IMHO the number(s) should be extracted and stored in separate columns in
  SOLR at indexing time.
 
  --
  Oleg
 
 
  On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com
  wrote:
 
   Hi,
  
   I have a problem (wonder if it is possible to solve it at all) with the
   following query. There are documents with a field which contains a text
  and
   a number in brackets, eg.
  
   myfield: this is a text (number)
  
   There might be some other documents with the same text but different
  number
   in brackets.
   I'd like to find documents with the given text say this is a text and
   number between A and B. Is it possible in Solr ? Any ideas ?
  
   Kind regards.
  
 



AW: About Suggestions

2013-07-16 Thread Lochschmied, Alexander
Hi Eric and everybody else!

Thanks for trying to help. Here is the example: 

.../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=trueterms.limit=20terms.sort=indexterms.prefix=1n1187

returns

int name=1n11871/int
int name=1n1187a1/int
int name=1n1187r1/int
int name=1n1187ra1/int

This list contains 3 complete part numbers but the third item (1n1187r) is not 
a complete part number. Is there a way to make terms tell if a term represents 
a complete value?
(My guess is that this gets lost after ngram but I'm still hoping something can 
be done.)

More config details:

field name=suggest type=text_parts indexed=true stored=true 
required=false multiValued=true/

and

fieldType name=text_parts class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=20 side=front/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 13. Juli 2013 19:58
An: solr-user@lucene.apache.org
Betreff: Re: About Suggestions

Not quite sure what you mean here, a couple of examples would help.

But since the term is using keyword tokenizer, then each thing you get back is 
a complete term, by definition. So I'm not quite sure what you're asking 
here.

Best
Erick

On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:
 Hi Solr people!

 We need to suggest part numbers in alphabetically order adding up to four 
 characters to the already entered part number prefix. That works quite well 
 with terms component acting on a multivalued field with keyword tokenizer and 
 edge nGram filter. I am mentioning part numbers to indicate that each item 
 in the multivalued field is a string without whitespace and where special 
 characters like dashes cannot be seen as separators.

 Is there a way to know if the term (the suggestion) represents such a 
 complete part number (without doing another query for each suggestion)?

 Since we are using SolJ, what we would need is something like
 boolean Term.isRepresentingCompleteFieldValue()

 Thanks,
 Alexander


Re: Range query on a substring.

2013-07-16 Thread Marcin Rzewucki
By multivalued I meant an array of values. For example:
arr name=myfield
  strtext1 (X)/str
  strtext2 (Y)/str
/arr

I'd like to avoid spliting it as you propose. I have 2.3mn collection with
pretty large records (few hundreds fields and more per record). Duplicating
them would impact performance.

Regards.



On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote:

 Ah, you mean something like this:
 record:
 Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
 Id=11, text =  this is a text N1 (W), another text N2 (Q), third text (M)

 and you need to search for: text N1 and X  B ?
 How big is the core? the first thing that comes to my mind, again, at
 indexing level,
 split the text into pieces and index it in solr like this:

 record_id | text  | value
 10   | text N1 | X
 10   | text N2 | Y
 10   | text N3 | Z

 does it help?



 On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:

  Hi Oleg,
  It's a multivalued field and it won't be easier to query when I split
 this
  field into text and numbers. I may get wrong results.
 
  Regards.
 
 
  On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote:
 
   IMHO the number(s) should be extracted and stored in separate columns
 in
   SOLR at indexing time.
  
   --
   Oleg
  
  
   On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com
   wrote:
  
Hi,
   
I have a problem (wonder if it is possible to solve it at all) with
 the
following query. There are documents with a field which contains a
 text
   and
a number in brackets, eg.
   
myfield: this is a text (number)
   
There might be some other documents with the same text but different
   number
in brackets.
I'd like to find documents with the given text say this is a text
 and
number between A and B. Is it possible in Solr ? Any ideas ?
   
Kind regards.
   
  
 



Re: How to change extracted directory

2013-07-16 Thread wolbi
As I said, if I change it in context.xml it works... but the question is...
how to make it from commandline, without modyfing config files.
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-change-extracted-directory-tp4078024p4078284.html
Sent from the Solr - User mailing list archive at Nabble.com.


[solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Dmitry Kan
Hello list,

Following the answer by Jaendra here:

http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core


Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Dmitry Kan
Sorry, hit send too fast..

picking up:

from the answer by Jayendra on the link, collections and cores are the same
thing. Same is seconded by the config:

  cores adminPath=/admin/cores defaultCoreName=collection1
host=${host:} hostPort=${jetty.port:8983}
hostContext=${hostContext:solr}
zkClientTimeout=${zkClientTimeout:15000}
core name=collection1 instanceDir=. /
  /cores

we basically define cores.

We have a plain {frontend_solr, shards} setup with solr 3.4 and were
thinking of starting off with it initially in solr 4. In solr 4: can one
get by without using collections = cores?

We also don't plan on using SolrCloud at the moment. So from our standpoint
the solr4 configuration looks more complicated, than that of solr 3.4. Are
there any benefits of such a setup for non SolrCloud users?

Thanks,

Dmitry



On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Hello list,

 Following the answer by Jaendra here:


 http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core



Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-16 Thread Erick Erickson
First, when switching subjects please start a new thread. It gets
confusing to have multiple topics, it's called thread hijacking.

Second, I have no clue why your Nutch output is outputting
invalid characters. Sounds like
1 your custom plugin is doing something weird
or
2 something you could configure in Nutch. So I'd recommend
asking on the Nutch board.

Best
Erick

On Mon, Jul 15, 2013 at 11:40 AM, glumet jan.bouch...@gmail.com wrote:
 As I can see, this is the same problem like one from older posts -
 http://lucene.472066.n3.nabble.com/strange-utf-8-problem-td3094473.html
 ...but it was without any response.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913p4078079.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Live reload

2013-07-16 Thread O. Klein
I used the reload command to apply changes in synonyms.txt for example, but
with the  new mechanisme https://wiki.apache.org/solr/CoreAdmin#LiveReload  
this will not work anymore.

Is there another way to reload config files instead of restarting Solr?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Live-reload-tp4078318.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-16 Thread Erick Erickson
Roman:

Did this ever make into a JIRA? Somehow I missed it if it did, and this would
be pretty cool

Erick

On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com wrote:
 On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote:

 Hello Erick,

  Join performance is most sensitive to the number of values
  in the field being joined on. So if you have lots and lots of
  distinct values in the corpus, join performance will be affected.
 Yep, we have a list of unique Id's that we get by first searching for
 records
 where loggedInUser IS IN (userIDs)
 This corpus is stored in memory I suppose? (not a problem) and then the
 bottleneck is to match this huge set with the core where I'm searching?

 Somewhere in maillist archive people were talking about external list of
 Solr unique IDs
 but didn't find if there is a solution.
 Back in 2010 Yonik posted a comment:
 http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd


 sorry, haven't the previous thread in its entirety, but few weeks back that
 Yonik's proposal got implemented, it seems ;)

 http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter

 You could use this to send very large bitset filter (which can be
 translated into any integers, if you can come up with a mapping function).

 roman



  bq: I suppose the delete/reindex approach will not change soon
  There is ongoing work (search the JIRA for Stacked Segments)
 Ah, ok, I was feeling it affects the architecture, ok, now the only hope is
 Pseudo-Joins ))

  One way to deal with this is to implement a post filter, sometimes
 called
  a no cache filter.
 thanks, will have a look, but as you describe it, it's not the best option.

 The approach
 too many documents, man. Please refine your query. Partial results below
 means faceting will not work correctly?

 ... I have in mind a hybrid approach, comments welcome:
 Most of the time users are not searching, but browsing content, so our
 virtual filesystem stored in SOLR will use only the index with the Id of
 the file and the list of users that have access to it. i.e. not touching
 the fulltext index at all.

 Files may have metadata (EXIF info for images for ex) that we'd like to
 filter by, calculate facets.
 Meta will be stored in both indexes.

 In case of a fulltext query:
 1. search FT index (the fulltext index), get only the number of search
 results, let it be Rf
 2. search DAC index (the index with permissions), get number of search
 results, let it be Rd

 let maxR be the maximum size of the corpus for the pseudo-join.
 *That was actually my question: what is a reasonable number? 10, 100, 1000
 ?
 *

 if (Rf  maxR) or (Rd  maxR) then use the smaller corpus to join onto the
 second one.
 this happens when (only a few documents contains the search query) OR (user
 has access to a small number of files).

 In case none of these happens, we can use the
 too many documents, man. Please refine your query. Partial results below
 but first searching the FT index, because we want relevant results first.

 What do you think?

 Regards,
 Oleg




 On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Join performance is most sensitive to the number of values
  in the field being joined on. So if you have lots and lots of
  distinct values in the corpus, join performance will be affected.
 
  bq: I suppose the delete/reindex approach will not change soon
 
  There is ongoing work (search the JIRA for Stacked Segments)
  on actually doing something about this, but it's been under
 consideration
  for at least 3 years so your guess is as good as mine.
 
  bq: notice that the worst situation is when everyone has access to all
 the
  files, it means the first filter will be the full index.
 
  One way to deal with this is to implement a post filter, sometimes
 called
  a no cache filter. The distinction here is that
  1 it is not cached (duh!)
  2 it is only called for documents that have made it through all the
   other lower cost filters (and the main query of course).
  3 lower cost means the filter is either a standard, cached filters
  and any no cache filters with a cost (explicitly stated in the
 query)
  lower than this one's.
 
  Critically, and unlike normal filter queries, the result set is NOT
  calculated for all documents ahead of time
 
  You _still_ have to deal with the sysadmin doing a *:* query as you
  are well aware. But one can mitigate that by having the post-filter
  fail all documents after some arbitrary N, and display a message in the
  app like too many documents, man. Please refine your query. Partial
  results below. Of course this may not be acceptable, but
 
  HTH
  Erick
 
  On Sun, Jul 14, 2013 at 12:05 PM, Jack Krupansky
  j...@basetechnology.com wrote:
   Take a look at LucidWorks Search and its access control:
  
 
 http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control
  
   Role-based 

Re: About Suggestions

2013-07-16 Thread Erick Erickson
Garbage in, garbage out G

Your indexing analysis chain is breaking up the tokens via the
EdgeNgramTokenizer and _putting those values in the index_.
Then the TermsComponent is looking _only_ at the tokens in
the index and giving you back exactly what you're asking for.

So no, there's no way with that analysis chain to get only complete
terms, at that level the fact that a term was part of a larger
input token has been lost. In fact, if you were to enter something
like terms.prefix=1n1 you'd likely see all your 3-grams that start
with 1n1 etc.

So use a copyfield and put these in a separate field that has
only whole tokens or just take the EdgeNgramTokenizer from
your current definition. If the latter, blow away your index and re-index
from scratch.

Best
Erick

On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander
alexander.lochschm...@vishay.com wrote:
 Hi Eric and everybody else!

 Thanks for trying to help. Here is the example:

 .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=trueterms.limit=20terms.sort=indexterms.prefix=1n1187

 returns

 int name=1n11871/int
 int name=1n1187a1/int
 int name=1n1187r1/int
 int name=1n1187ra1/int

 This list contains 3 complete part numbers but the third item (1n1187r) is 
 not a complete part number. Is there a way to make terms tell if a term 
 represents a complete value?
 (My guess is that this gets lost after ngram but I'm still hoping something 
 can be done.)

 More config details:

 field name=suggest type=text_parts indexed=true stored=true 
 required=false multiValued=true/

 and

 fieldType name=text_parts class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
 maxGramSize=20 side=front/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 Thanks,
 Alexander


 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Samstag, 13. Juli 2013 19:58
 An: solr-user@lucene.apache.org
 Betreff: Re: About Suggestions

 Not quite sure what you mean here, a couple of examples would help.

 But since the term is using keyword tokenizer, then each thing you get back 
 is a complete term, by definition. So I'm not quite sure what you're asking 
 here.

 Best
 Erick

 On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Hi Solr people!

 We need to suggest part numbers in alphabetically order adding up to four 
 characters to the already entered part number prefix. That works quite well 
 with terms component acting on a multivalued field with keyword tokenizer 
 and edge nGram filter. I am mentioning part numbers to indicate that each 
 item in the multivalued field is a string without whitespace and where 
 special characters like dashes cannot be seen as separators.

 Is there a way to know if the term (the suggestion) represents such a 
 complete part number (without doing another query for each suggestion)?

 Since we are using SolJ, what we would need is something like
 boolean Term.isRepresentingCompleteFieldValue()

 Thanks,
 Alexander


Re: Different 'fl' for first X results

2013-07-16 Thread Erick Erickson
You could also use a DocTransformer. But really, unless these
fields are quite long it seems overkill to do anything but ignore
them when returned for docs you don't care about.

Best
Erick

On Mon, Jul 15, 2013 at 7:05 PM, Jack Krupansky j...@basetechnology.com wrote:
 SOLR-5005 - JavaScriptRequestHandler
 https://issues.apache.org/jira/browse/SOLR-5005

 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Monday, July 15, 2013 6:56 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Different 'fl' for first X results


 Is there a JIRA number for the last one?

 Regards,
 Alex
 On 15 Jul 2013 17:21, Jack Krupansky j...@basetechnology.com wrote:

 1. Request all fields needed for all results and simply ignore the extra
 field(s) (which can be empty or missing and will automatically be ignored
 by Solr anyway).
 2. Two separate query requests.
 3. A custom search component.
 4. Wait for the new scripted query request handler that gives you full
 control in a custom script.

 -- Jack Krupansky

 -Original Message- From: Weber
 Sent: Monday, July 15, 2013 4:58 PM
 To: solr-user@lucene.apache.org
 Subject: Different 'fl' for first X results

 How to get a different field list in the first X results? For example, in
 the
 first 5 results I want fields A, B, C, and on the next results I need only
 fields A, and B.



 --
 View this message in context: http://lucene.472066.n3.**

 nabble.com/Different-fl-for-**first-X-results-tp4078178.htmlhttp://lucene.472066.n3.nabble.com/Different-fl-for-first-X-results-tp4078178.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-16 Thread Alexandre Rafalovitch
Is that this one: https://issues.apache.org/jira/browse/SOLR-1913 ?

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 16, 2013 at 8:01 AM, Erick Erickson erickerick...@gmail.comwrote:

 Roman:

 Did this ever make into a JIRA? Somehow I missed it if it did, and this
 would
 be pretty cool

 Erick

 On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
  On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com
 wrote:
 
  Hello Erick,
 
   Join performance is most sensitive to the number of values
   in the field being joined on. So if you have lots and lots of
   distinct values in the corpus, join performance will be affected.
  Yep, we have a list of unique Id's that we get by first searching for
  records
  where loggedInUser IS IN (userIDs)
  This corpus is stored in memory I suppose? (not a problem) and then the
  bottleneck is to match this huge set with the core where I'm searching?
 
  Somewhere in maillist archive people were talking about external list
 of
  Solr unique IDs
  but didn't find if there is a solution.
  Back in 2010 Yonik posted a comment:
  http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd
 
 
  sorry, haven't the previous thread in its entirety, but few weeks back
 that
  Yonik's proposal got implemented, it seems ;)
 
 
 http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter
 
  You could use this to send very large bitset filter (which can be
  translated into any integers, if you can come up with a mapping
 function).
 
  roman
 
 
 
   bq: I suppose the delete/reindex approach will not change soon
   There is ongoing work (search the JIRA for Stacked Segments)
  Ah, ok, I was feeling it affects the architecture, ok, now the only
 hope is
  Pseudo-Joins ))
 
   One way to deal with this is to implement a post filter, sometimes
  called
   a no cache filter.
  thanks, will have a look, but as you describe it, it's not the best
 option.
 
  The approach
  too many documents, man. Please refine your query. Partial results
 below
  means faceting will not work correctly?
 
  ... I have in mind a hybrid approach, comments welcome:
  Most of the time users are not searching, but browsing content, so our
  virtual filesystem stored in SOLR will use only the index with the Id
 of
  the file and the list of users that have access to it. i.e. not touching
  the fulltext index at all.
 
  Files may have metadata (EXIF info for images for ex) that we'd like to
  filter by, calculate facets.
  Meta will be stored in both indexes.
 
  In case of a fulltext query:
  1. search FT index (the fulltext index), get only the number of search
  results, let it be Rf
  2. search DAC index (the index with permissions), get number of search
  results, let it be Rd
 
  let maxR be the maximum size of the corpus for the pseudo-join.
  *That was actually my question: what is a reasonable number? 10, 100,
 1000
  ?
  *
 
  if (Rf  maxR) or (Rd  maxR) then use the smaller corpus to join onto
 the
  second one.
  this happens when (only a few documents contains the search query) OR
 (user
  has access to a small number of files).
 
  In case none of these happens, we can use the
  too many documents, man. Please refine your query. Partial results
 below
  but first searching the FT index, because we want relevant results
 first.
 
  What do you think?
 
  Regards,
  Oleg
 
 
 
 
  On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   Join performance is most sensitive to the number of values
   in the field being joined on. So if you have lots and lots of
   distinct values in the corpus, join performance will be affected.
  
   bq: I suppose the delete/reindex approach will not change soon
  
   There is ongoing work (search the JIRA for Stacked Segments)
   on actually doing something about this, but it's been under
  consideration
   for at least 3 years so your guess is as good as mine.
  
   bq: notice that the worst situation is when everyone has access to all
  the
   files, it means the first filter will be the full index.
  
   One way to deal with this is to implement a post filter, sometimes
  called
   a no cache filter. The distinction here is that
   1 it is not cached (duh!)
   2 it is only called for documents that have made it through all the
other lower cost filters (and the main query of course).
   3 lower cost means the filter is either a standard, cached filters
   and any no cache filters with a cost (explicitly stated in the
  query)
   lower than this one's.
  
   Critically, and unlike normal filter queries, the result set is NOT
   calculated for all documents ahead of time
  
   You _still_ have to deal with the sysadmin 

Re: How to use joins in solr 4.3.1

2013-07-16 Thread Erick Erickson
Not quite sure what's the problem with the second, but the
first is:
q=:

That just isn't legal, try q=*:*

As for the second, are there any other errors in the solr log?
Sometimes what's returned in the response packet does not
include the true source of the problem.

Best
Erick

On Mon, Jul 15, 2013 at 7:40 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
 I have also tried these queries (as per this SO answer:
 http://stackoverflow.com/questions/12665797/is-solr-4-0-capable-of-using-join-for-multiple-core
 )

 1. http://_server_.com:8983/solr/location/select?q=:fq={!join
 from=merchantId to=merchantId fromIndex=merchant}walgreens

 And I get this:

 {
   responseHeader:{
 status:400,
 QTime:1,
 params:{
   indent:true,
   q::,
   wt:json,
   fq:{!join from=merchantId to=merchantId
 fromIndex=merchant}walgreens}},
   error:{
 msg:org.apache.solr.search.SyntaxError: Cannot parse ':':
 Encountered \ \:\ \: \\ at line 1, column 0.\nWas expecting one
 of:\nNOT ...\n\+\ ...\n\-\ ...\nBAREOPER ...\n
\(\ ...\n\*\ ...\nQUOTED ...\nTERM ...\n
 PREFIXTERM ...\nWILDTERM ...\nREGEXPTERM ...\n\[\
 ...\n\{\ ...\nLPARAMS ...\nNUMBER ...\nTERM
 ...\n\*\ ...\n,
 code:400}}

 And this:
 2.http://_server_.com:8983/solr/location/select?q=walgreensfq={!join
 from=merchantId to=merchantId fromIndex=merchant}

 {
   responseHeader:{
 status:500,
 QTime:5,
 params:{
   indent:true,
   q:walgreens,
   wt:json,
   fq:{!join from=merchantId to=merchantId fromIndex=merchant}}},
   error:{
 msg:Server at http://_SERVER_:8983/solr/location returned non
 ok status:500, message:Server Error,
 
 trace:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://_SERVER_:8983/solr/location returned non ok
 status:500, message:Server Error\n\tat
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)\n\tat
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)\n\tat
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156)\n\tat
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119)\n\tat
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)\n\tat
 java.util.concurrent.FutureTask.run(FutureTask.java:138)\n\tat
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)\n\tat
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)\n\tat
 java.util.concurrent.FutureTask.run(FutureTask.java:138)\n\tat
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)\n\tat
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)\n\tat
 java.lang.Thread.run(Thread.java:662)\n,
 code:500}}

 Thanks,
 -Utkarsh



 On Mon, Jul 15, 2013 at 4:27 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Hello,

 I am trying to join data between two cores: merchant and location

 This is my query:
 http://_server_.com:8983/solr/location/select?q={!join from=merchantId
 to=merchantId fromIndex=merchant}walgreens
 Ref: http://wiki.apache.org/solr/Join


 Merchants core has documents for the query: walgreens with an merchantId
 1
  A simple query: http://_server_.com:8983/solr/location/select?q=walgreens
 returns documents called walgreens with merchantId=1

 Location core has documents with merchantId=1 too.

 But my join query returns no documents.

 This is the response I get:
 {
   responseHeader:{
 status:0,
 QTime:5,
 params:{
   debugQuery:true,
   indent:true,
   q:{!join from=merchantId to=merchantId
 fromIndex=merchant}walgreens,
   wt:json}},
   response:{numFound:0,start:0,maxScore:0.0,docs:[]
   },
   debug:{
 rawquerystring:{!join from=merchantId to=merchantId
 fromIndex=merchant}walgreens,
 querystring:{!join from=merchantId to=merchantId
 fromIndex=merchant}walgreens,
 parsedquery:JoinQuery({!join from=merchantId to=merchantId
 fromIndex=merchant}allText:walgreens),
 parsedquery_toString:{!join from=merchantId to=merchantId
 fromIndex=merchant}allText:walgreens,
 QParser:,
 explain:{}}}


 Any suggestions?


 --
 Thanks,
 -Utkarsh




 --
 Thanks,
 -Utkarsh


Re: Range query on a substring.

2013-07-16 Thread Jack Krupansky
Sorry, but you are basically misusing Solr (and multivalued fields), trying 
to take a shortcut to avoid a proper data model.


To properly use Solr, you need to put each of these multivalued field values 
in a separate Solr document, with a text field and a value field. Then, 
you can query:


   text:some text AND value:[min-value TO max-value]

Exactly how you should restructure your data model is dependent on all of 
your other requirements.


You may be able to simply flatten your data.

You may be able to use a simple join operation.

Or, maybe you need to do a multi-step query operation if you data is 
sufficiently complex.


If you want to keep your multivalued field in its current form for display 
purposes or keyword search, or exact match search, fine, but your stated 
goal is inconsistent with the Semantics of Solr and Lucene.


To be crystal clear, there is no such thing as a range query on a 
substring in Solr or Lucene.


-- Jack Krupansky

-Original Message- 
From: Marcin Rzewucki

Sent: Tuesday, July 16, 2013 5:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Range query on a substring.

By multivalued I meant an array of values. For example:
arr name=myfield
 strtext1 (X)/str
 strtext2 (Y)/str
/arr

I'd like to avoid spliting it as you propose. I have 2.3mn collection with
pretty large records (few hundreds fields and more per record). Duplicating
them would impact performance.

Regards.



On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote:


Ah, you mean something like this:
record:
Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
Id=11, text =  this is a text N1 (W), another text N2 (Q), third text 
(M)


and you need to search for: text N1 and X  B ?
How big is the core? the first thing that comes to my mind, again, at
indexing level,
split the text into pieces and index it in solr like this:

record_id | text  | value
10   | text N1 | X
10   | text N2 | Y
10   | text N3 | Z

does it help?



On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com
wrote:

 Hi Oleg,
 It's a multivalued field and it won't be easier to query when I split
this
 field into text and numbers. I may get wrong results.

 Regards.


 On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote:

  IMHO the number(s) should be extracted and stored in separate columns
in
  SOLR at indexing time.
 
  --
  Oleg
 
 
  On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki mrzewu...@gmail.com
  wrote:
 
   Hi,
  
   I have a problem (wonder if it is possible to solve it at all) with
the
   following query. There are documents with a field which contains a
text
  and
   a number in brackets, eg.
  
   myfield: this is a text (number)
  
   There might be some other documents with the same text but different
  number
   in brackets.
   I'd like to find documents with the given text say this is a text
and
   number between A and B. Is it possible in Solr ? Any ideas ?
  
   Kind regards.
  
 






AW: About Suggestions

2013-07-16 Thread Lochschmied, Alexander
Thanks Eric, that is what I suspected. We are very happy with the four 
suggestions in the example (and all the others), but we would like to know 
which of them represents a full part number.
Can you elaborate a little more how that could be achieved?

Best regards,
Alexander

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 16. Juli 2013 14:09
An: solr-user@lucene.apache.org
Betreff: Re: About Suggestions

Garbage in, garbage out G

Your indexing analysis chain is breaking up the tokens via the 
EdgeNgramTokenizer and _putting those values in the index_.
Then the TermsComponent is looking _only_ at the tokens in the index and giving 
you back exactly what you're asking for.

So no, there's no way with that analysis chain to get only complete terms, at 
that level the fact that a term was part of a larger input token has been lost. 
In fact, if you were to enter something like terms.prefix=1n1 you'd likely see 
all your 3-grams that start with 1n1 etc.

So use a copyfield and put these in a separate field that has only whole tokens 
or just take the EdgeNgramTokenizer from your current definition. If the 
latter, blow away your index and re-index from scratch.

Best
Erick

On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander 
alexander.lochschm...@vishay.com wrote:
 Hi Eric and everybody else!

 Thanks for trying to help. Here is the example:

 .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=tru
 eterms.limit=20terms.sort=indexterms.prefix=1n1187

 returns

 int name=1n11871/int
 int name=1n1187a1/int
 int name=1n1187r1/int
 int name=1n1187ra1/int

 This list contains 3 complete part numbers but the third item (1n1187r) is 
 not a complete part number. Is there a way to make terms tell if a term 
 represents a complete value?
 (My guess is that this gets lost after ngram but I'm still hoping 
 something can be done.)

 More config details:

 field name=suggest type=text_parts indexed=true stored=true 
 required=false multiValued=true/

 and

 fieldType name=text_parts class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
 maxGramSize=20 side=front/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 Thanks,
 Alexander


 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Samstag, 13. Juli 2013 19:58
 An: solr-user@lucene.apache.org
 Betreff: Re: About Suggestions

 Not quite sure what you mean here, a couple of examples would help.

 But since the term is using keyword tokenizer, then each thing you get back 
 is a complete term, by definition. So I'm not quite sure what you're asking 
 here.

 Best
 Erick

 On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Hi Solr people!

 We need to suggest part numbers in alphabetically order adding up to four 
 characters to the already entered part number prefix. That works quite well 
 with terms component acting on a multivalued field with keyword tokenizer 
 and edge nGram filter. I am mentioning part numbers to indicate that each 
 item in the multivalued field is a string without whitespace and where 
 special characters like dashes cannot be seen as separators.

 Is there a way to know if the term (the suggestion) represents such a 
 complete part number (without doing another query for each suggestion)?

 Since we are using SolJ, what we would need is something like
 boolean Term.isRepresentingCompleteFieldValue()

 Thanks,
 Alexander


Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-16 Thread Luis Carlos Guerrero Covo
Thanks, I was actually asking about deleting nodes from the cluster state
not cores, unless you can unload cores specific to an already offline node
from zookeeper.


On Tue, Jul 16, 2013 at 1:55 AM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi,

 You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded
 cores. This will unregister them from the zookeeper (cluster state will be
 updated), so they won't be used for querying any longer. Solrcloud restart
 is not needed in this case.

 Regards.


 On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote:

  Hello Luis,
 
  I don't think that is possible. If you delete clusterstate.json from
  zookeeper, you will need to restart the nodes.. I could be very wrong
  about this
 
  Saqib
 
 
  On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo 
  lcguerreroc...@gmail.com wrote:
 
   I know that you can clear zookeeper's data directoy using the CLI with
  the
   clear command, I just want to know if its possible to update the
  cluster's
   state without wiping everything out. Anyone have any ideas/suggestions?
  
  
   On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo 
   lcguerreroc...@gmail.com wrote:
  
Hi,
   
Is there an easy way to clear zookeeper of all offline solr nodes
  without
restarting the cluster? We are having some stability issues and we
  think
   it
maybe due to the leader querying old offline nodes.
   
thank you,
   
Luis Guerrero
   
  
  
  
   --
   Luis Carlos Guerrero Covo
   M.S. Computer Engineering
   (57) 3183542047
  
 




-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


Are analysers applied to each value in a multi-valued field separately?

2013-07-16 Thread Daniel Collins
I'm guessing the answer is yes, but here's the background.

We index 2 separate fields, headline and body text for a document, and then
we want to identify the top of the story which is th headline + N words
of the body (we want to weight that in scoring).

So do to that:

copyField src=headline dest=top/
copyField src=body dest=top/

And the top field has a LimitTokenCountFilterFactory appended to it to do
the limiting.

filter class=solr.LimitTokenCountFilterFactory
maxTokenCount=N/

I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top field...
 The field is indexed but not stored, so its hard to determine exactly
which is being done.

Logically, I presume each value in the field is independent (and Solr then
just matches searches against each one), so that would suggest N is per
value?

Cheers, Daniel


Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Alexandre Rafalovitch
If you only have one collection and no Solr cloud, then don't use solr.xml
at all. It will automatically assume 'collection1' as a name.

If you do want to have some control (shards, etc), do not include the
optional parameters you do not need. See example here:
http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html

You don't even need defaultCoreName attribute, if you are happy to always
include core name in the URL.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote:

 Sorry, hit send too fast..

 picking up:

 from the answer by Jayendra on the link, collections and cores are the same
 thing. Same is seconded by the config:

   cores adminPath=/admin/cores defaultCoreName=collection1
 host=${host:} hostPort=${jetty.port:8983}
 hostContext=${hostContext:solr}
 zkClientTimeout=${zkClientTimeout:15000}
 core name=collection1 instanceDir=. /
   /cores

 we basically define cores.

 We have a plain {frontend_solr, shards} setup with solr 3.4 and were
 thinking of starting off with it initially in solr 4. In solr 4: can one
 get by without using collections = cores?

 We also don't plan on using SolrCloud at the moment. So from our standpoint
 the solr4 configuration looks more complicated, than that of solr 3.4. Are
 there any benefits of such a setup for non SolrCloud users?

 Thanks,

 Dmitry



 On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Hello list,
 
  Following the answer by Jaendra here:
 
 
 
 http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
 



solr 4.3.1 Installation

2013-07-16 Thread Sujatha Arun
Hi ,

We have been using solr 3.6.1 .Recently  downloaded the solr 4.3.1  version
 and installed the same  as multicore setup as follows

Folder Structure
solr.war
solr
 conf
   core0
core1
solr.xml

Created the context fragment xml file in tomcat/conf/catalina/localhost
which refers to the solr.war file and the solr home folder

copied the muticore conf folder without the zoo.cfg file

I get the following error and admin page does not load
16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig
checkResources
INFO: Undeploying context [/solr_4.3.1]
16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
SEVERE: Error filterStart
16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
SEVERE: Context [/solr_4.3.1] startup failed due to previous errors


Please let me know what I am missing If i need to install this with the
default multicore setup without the cloud .Thanks

Regards
Sujatha


Re: Doc's FunctionQuery result field in my custom SearchComponent class ?

2013-07-16 Thread Jack Krupansky
Basically, the evaluation of function queries in the fl parameter occurs 
when the response writer is composing the document results. That's AFTER all 
of the search components are done.


SolrReturnFields.getTransformer() gets the DocTransformer, which is really a 
DocTransformers, and then a call to DocTransformers.transform() in each 
response writer will evaluate the embedded function queries and insert their 
values in the results as they are being written.


-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Tuesday, July 16, 2013 1:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent 
class ?


No sorry, I am still not getting the termfreq() field in my 'doc' object.
I do get the _version_ field in my 'doc' object which I think is
realValue=StoredField.

At which point termfreq() or any other FunctionQuery field becomes the part
of doc object in Solr ? And at that point can I perform some custom logic
and append the response ?

Thanks.
Tony





On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin 
patanachai.tangchai...@wizecommerce.com wrote:


Hi,

I think the process of retrieving a stored field (through fl) is happens
after SearchComponent.

One solution: If you wrap a q params with function your score will be a
result of the function.
For example,

http://localhost:8080/solr/**collection2/demoendpoint?q=**
termfreq%28product,%27spider%**27%29wt=xmlindent=truefl=*,**scorehttp://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score


Now your score is going to be a result of termfreq(product,'spider')


--
Patanachai Tangchaisin



On 07/15/2013 12:01 PM, Tony Mullins wrote:


any help plz !!!


On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com*
*wrote:

 Please any help on how to get the value of 'freq' field in my custom

SearchComponent ?


http://localhost:8080/solr/**collection2/demoendpoint?q=**
spiderwt=xmlindent=truefl=***,freq:termfreq%28product,%**
27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29

docstr name=id11/strstr name=typeVideo Games/strstr
name=formatxbox 360/strstr name=productThe Amazing
Spider-Man/strint name=popularity11/int**long
name=_version_**1439994081345273856/longint
name=freq1/int/doc



Here is my code

DocList docs = rb.getResults().docList;
 DocIterator iterator = docs.iterator();
 int sumFreq = 0;
 String id = null;

 for (int i = 0; i  docs.size(); i++) {
 try {
 int docId = iterator.nextDoc();

// Document doc = searcher.doc(docId, fieldSet);
 Document doc = searcher.doc(docId);

In doc object I can see the schema fields like 'id', 'type','format' 
etc.

but I cannot find the field 'freq' which I needed. Is there any way to
get
the FunctionQuery fields in doc object ?

Thanks,
Tony



On Mon, Jul 15, 2013 at 1:16 PM, Tony Mullins tonymullins...@gmail.com
**wrote:

 Hi,


I have extended Solr's SearchComonent class and I am iterating through
all the docs in ResponseBuilder in @overrider Process() method.

Here I want to get the value of FucntionQuery result but in Document
object I am only seeing the standard field of document not the
FucntionQuery result.

This is my query


http://localhost:8080/solr/**collection2/demoendpoint?q=**
spiderwt=xmlindent=truefl=***,freq:termfreq%28product,%**
27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29

Result of above query in browser shows me that 'freq' is part of doc
but its not there in Document object in my @overrider Process() method.

How can I get the value of FunctionQuery result in my custom
SearchComponent ?

Thanks,
Tony






CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply email and destroy all copies of the original message along
with any attachments, from your computer system. If you are the intended
recipient, please be advised that the content of this message is subject 
to

access, review and disclosure by the sender's Email System Administrator.






SolrCloud softcommit problem

2013-07-16 Thread giovanni.bricc...@banzai.it

Hi

I'm using solr version 4.3.1. I have a core with only one shard and 
three replicas, say  server1, server2 and server3.

Suppose server1 is currently the leader

if I send an update to the leader everything works fine

wget -O -  --header='Content-type: text/xml' 
--post-data='adddocfield name=sku16910/fieldfield name=name 
update=setyy/field/doc/add' 
'server1:8080/solr/mycore/update?softCommit=true'


querying server 1 server2 and server3 I see the right answer, always 
yy


if instead I do send an update to a replica, say server2

wget -O -  --header='Content-type: text/xml' 
--post-data='adddocfield name=sku16910/fieldfield name=name 
update=setz/field/doc/add' 
'server2:8080/solr/mycore/update?softCommit=true'


I see on server1 (leader) and server3 the correct value 'z' but 
server2 continues to show the wrong value, y, untill I send a commit.


Am I using correctly the update api?

Thanks


Giovanni





Re: Are analysers applied to each value in a multi-valued field separately?

2013-07-16 Thread Jack Krupansky
Yes, each input value is analyzed separately. Solr passes each input value 
to Lucene and then Lucene analyzes each.


You could use LimitTokenPositionFilterFactory which uses the absolute token 
position - each successive analyzed value would have an incremented 
position, plus the positionIncrementGap (typically 100 for text.)


-- Jack Krupansky

-Original Message- 
From: Daniel Collins

Sent: Tuesday, July 16, 2013 8:46 AM
To: solr-user@lucene.apache.org
Subject: Are analysers applied to each value in a multi-valued field 
separately?


I'm guessing the answer is yes, but here's the background.

We index 2 separate fields, headline and body text for a document, and then
we want to identify the top of the story which is th headline + N words
of the body (we want to weight that in scoring).

So do to that:

copyField src=headline dest=top/
copyField src=body dest=top/

And the top field has a LimitTokenCountFilterFactory appended to it to do
the limiting.

   filter class=solr.LimitTokenCountFilterFactory
maxTokenCount=N/

I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top field...
The field is indexed but not stored, so its hard to determine exactly
which is being done.

Logically, I presume each value in the field is independent (and Solr then
just matches searches against each one), so that would suggest N is per
value?

Cheers, Daniel 



Need advice on performing 300 queries per second on solr index

2013-07-16 Thread adfel70
Hi
I need to create a solr cluster that contains geospatial information and
provides the ability to perform a few hundreds queries per second, each
query should retrieve around 100k results.
The data is around 100k documents, around 300gb total.

I started with 2 shard cluster (replicationFactor 1) and a portion of the
data - 20 gb.

I run some load-tests and see that when 100 requests are sent in one second,
the average qTime is around 4 seconds, but the average total response time
(measuring from sending the request to solr untill getting a response )
reaches 20-25 seconds which is very bad.

Currently I load-balance myself between the 2 solr servers (each request is
sent to another server)

Any advice on  which resources do I need and how my solr cluster should look
like?
More shards? more replicas? another webserver?

Thanks.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Config changes in solr.DirectSolrSpellCheck after index is built?

2013-07-16 Thread Brendan Grainger
Hi All,

Can you change the configuration of a spellchecker
using solr.DirectSolrSpellCheck after you've built an index? I know that
this spellchecker doesn't build and index off to the side like
the IndexBasedSpellChecker so I'm wondering what's happening internally to
create a spellchecking dictionary.

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com


Re: Need advice on performing 300 queries per second on solr index

2013-07-16 Thread Michael Della Bitta
Have you looked at cache utilization?
Have you checked the IO and CPU load to see what the bottlenecks are?
Are you sure things like your heap and servlet container threads are tuned?

After you look at those issues, I'd probably think about adding http
caching and more replicas.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
w: appinions.com http://www.appinions.com/


On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote:

 Hi
 I need to create a solr cluster that contains geospatial information and
 provides the ability to perform a few hundreds queries per second, each
 query should retrieve around 100k results.
 The data is around 100k documents, around 300gb total.

 I started with 2 shard cluster (replicationFactor 1) and a portion of the
 data - 20 gb.

 I run some load-tests and see that when 100 requests are sent in one
 second,
 the average qTime is around 4 seconds, but the average total response time
 (measuring from sending the request to solr untill getting a response )
 reaches 20-25 seconds which is very bad.

 Currently I load-balance myself between the 2 solr servers (each request is
 sent to another server)

 Any advice on  which resources do I need and how my solr cluster should
 look
 like?
 More shards? more replicas? another webserver?

 Thanks.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud softcommit problem

2013-07-16 Thread Daniel Collins
I think this is SOLR-4923 https://issues.apache.org/jira/browse/SOLR-4923,
should be fixed in 4.4 (when it comes out) or grab the branch_4x branch
from svn.



On 16 July 2013 14:12, giovanni.bricc...@banzai.it 
giovanni.bricc...@banzai.it wrote:

 Hi

 I'm using solr version 4.3.1. I have a core with only one shard and three
 replicas, say  server1, server2 and server3.
 Suppose server1 is currently the leader

 if I send an update to the leader everything works fine

 wget -O -  --header='Content-type: text/xml' --post-data='adddocfield
 name=sku16910/fieldfield name=name 
 update=setyy/field/**doc/add'
 'server1:8080/solr/mycore/**update?softCommit=true'

 querying server 1 server2 and server3 I see the right answer, always
 yy

 if instead I do send an update to a replica, say server2

 wget -O -  --header='Content-type: text/xml' --post-data='adddocfield
 name=sku16910/fieldfield name=name 
 update=setz/field/**doc/add'
 'server2:8080/solr/mycore/**update?softCommit=true'

 I see on server1 (leader) and server3 the correct value 'z' but
 server2 continues to show the wrong value, y, untill I send a commit.

 Am I using correctly the update api?

 Thanks


 Giovanni






Re: Are analysers applied to each value in a multi-valued field separately?

2013-07-16 Thread Daniel Collins
Thanks Jack.

There seem to be a never ending set of FilterFactories, I keep hearing
about new ones all the time :)

Ok, I get it, so our existing code is the first N tokens of each value, and
using LimitTokenPositionFilterFactor**y with the same number would give us
the first N of the combined set of tokens, that's good to know.



On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote:

 Yes, each input value is analyzed separately. Solr passes each input value
 to Lucene and then Lucene analyzes each.

 You could use LimitTokenPositionFilterFactor**y which uses the absolute
 token position - each successive analyzed value would have an incremented
 position, plus the positionIncrementGap (typically 100 for text.)

 -- Jack Krupansky

 -Original Message- From: Daniel Collins
 Sent: Tuesday, July 16, 2013 8:46 AM
 To: solr-user@lucene.apache.org
 Subject: Are analysers applied to each value in a multi-valued field
 separately?


 I'm guessing the answer is yes, but here's the background.

 We index 2 separate fields, headline and body text for a document, and then
 we want to identify the top of the story which is th headline + N words
 of the body (we want to weight that in scoring).

 So do to that:

 copyField src=headline dest=top/
 copyField src=body dest=top/

 And the top field has a LimitTokenCountFilterFactory appended to it to do
 the limiting.

filter class=solr.**LimitTokenCountFilterFactory
 maxTokenCount=N/

 I realised that top needs to be multi-valued, which got me thinking: is
 that N tokens PER VALUE of top or N tokens in total within the top field...
 The field is indexed but not stored, so its hard to determine exactly
 which is being done.

 Logically, I presume each value in the field is independent (and Solr then
 just matches searches against each one), so that would suggest N is per
 value?

 Cheers, Daniel



Re: Are analysers applied to each value in a multi-valued field separately?

2013-07-16 Thread Daniel Collins
Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to PI
+ N to give the results above because of the increment gap between values.


On 16 July 2013 17:16, Daniel Collins danwcoll...@gmail.com wrote:

 Thanks Jack.

 There seem to be a never ending set of FilterFactories, I keep hearing
 about new ones all the time :)

 Ok, I get it, so our existing code is the first N tokens of each value,
 and using LimitTokenPositionFilterFactor**y with the same number would
 give us the first N of the combined set of tokens, that's good to know.



 On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote:

 Yes, each input value is analyzed separately. Solr passes each input
 value to Lucene and then Lucene analyzes each.

 You could use LimitTokenPositionFilterFactor**y which uses the absolute
 token position - each successive analyzed value would have an incremented
 position, plus the positionIncrementGap (typically 100 for text.)

 -- Jack Krupansky

 -Original Message- From: Daniel Collins
 Sent: Tuesday, July 16, 2013 8:46 AM
 To: solr-user@lucene.apache.org
 Subject: Are analysers applied to each value in a multi-valued field
 separately?


 I'm guessing the answer is yes, but here's the background.

 We index 2 separate fields, headline and body text for a document, and
 then
 we want to identify the top of the story which is th headline + N words
 of the body (we want to weight that in scoring).

 So do to that:

 copyField src=headline dest=top/
 copyField src=body dest=top/

 And the top field has a LimitTokenCountFilterFactory appended to it to
 do
 the limiting.

filter class=solr.**LimitTokenCountFilterFactory
 maxTokenCount=N/

 I realised that top needs to be multi-valued, which got me thinking: is
 that N tokens PER VALUE of top or N tokens in total within the top
 field...
 The field is indexed but not stored, so its hard to determine exactly
 which is being done.

 Logically, I presume each value in the field is independent (and Solr then
 just matches searches against each one), so that would suggest N is per
 value?

 Cheers, Daniel





Re: Need advice on performing 300 queries per second on solr index

2013-07-16 Thread Daniel Collins
You only have a 20Gb collection but is that per machine or total
collection, so 10Gb per machine?  What memory do you have available on
those 2 machines, is it enough to get the collection into the disk cache?
 What OS is it (linux/windows, etc)?
What heap size does your JVM have?
Is it a static collection or are you updating it as well?

4s for a query to 25s end to end time seems a long disparity to me, I'd be
curious as to where the time is going. SolrCloud will distribute the
initial queries out to the shards (but with fl=uniquekey,score), then it
seconds a second request once it has the list of documents, with
fl=whatever you asked for to get the stored fields.  Might be interesting
to see if the query is 4s, how long does the stored field request take (if
its long you might want to consider docValues or ask for less!).

If you are using SolrCloud, you should be able to see the distributed
requests (we see 3 per user request: distributed (on each shard),
storedfields (on each shard that returned something) and then the user
request on the machine you sent the request to), see if that gives you any
indications where the time is going?




On 16 July 2013 16:12, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Have you looked at cache utilization?
 Have you checked the IO and CPU load to see what the bottlenecks are?
 Are you sure things like your heap and servlet container threads are tuned?

 After you look at those issues, I'd probably think about adding http
 caching and more replicas.

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote:

  Hi
  I need to create a solr cluster that contains geospatial information and
  provides the ability to perform a few hundreds queries per second, each
  query should retrieve around 100k results.
  The data is around 100k documents, around 300gb total.
 
  I started with 2 shard cluster (replicationFactor 1) and a portion of the
  data - 20 gb.
 
  I run some load-tests and see that when 100 requests are sent in one
  second,
  the average qTime is around 4 seconds, but the average total response
 time
  (measuring from sending the request to solr untill getting a response )
  reaches 20-25 seconds which is very bad.
 
  Currently I load-balance myself between the 2 solr servers (each request
 is
  sent to another server)
 
  Any advice on  which resources do I need and how my solr cluster should
  look
  like?
  More shards? more replicas? another webserver?
 
  Thanks.
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: Need advice on performing 300 queries per second on solr index

2013-07-16 Thread Walter Underwood
Are you requesting all 100K results in one request? If so, that is pretty fast.

If you are doing that, don't do that. Page the results.

wunder

On Jul 16, 2013, at 9:30 AM, Daniel Collins wrote:

 You only have a 20Gb collection but is that per machine or total
 collection, so 10Gb per machine?  What memory do you have available on
 those 2 machines, is it enough to get the collection into the disk cache?
 What OS is it (linux/windows, etc)?
 What heap size does your JVM have?
 Is it a static collection or are you updating it as well?
 
 4s for a query to 25s end to end time seems a long disparity to me, I'd be
 curious as to where the time is going. SolrCloud will distribute the
 initial queries out to the shards (but with fl=uniquekey,score), then it
 seconds a second request once it has the list of documents, with
 fl=whatever you asked for to get the stored fields.  Might be interesting
 to see if the query is 4s, how long does the stored field request take (if
 its long you might want to consider docValues or ask for less!).
 
 If you are using SolrCloud, you should be able to see the distributed
 requests (we see 3 per user request: distributed (on each shard),
 storedfields (on each shard that returned something) and then the user
 request on the machine you sent the request to), see if that gives you any
 indications where the time is going?
 
 
 
 
 On 16 July 2013 16:12, Michael Della Bitta 
 michael.della.bi...@appinions.com wrote:
 
 Have you looked at cache utilization?
 Have you checked the IO and CPU load to see what the bottlenecks are?
 Are you sure things like your heap and servlet container threads are tuned?
 
 After you look at those issues, I'd probably think about adding http
 caching and more replicas.
 
 Michael Della Bitta
 
 Applications Developer
 
 o: +1 646 532 3062  | c: +1 917 477 7906
 
 appinions inc.
 
 “The Science of Influence Marketing”
 
 18 East 41st Street
 
 New York, NY 10017
 
 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/
 
 
 On Tue, Jul 16, 2013 at 10:42 AM, adfel70 adfe...@gmail.com wrote:
 
 Hi
 I need to create a solr cluster that contains geospatial information and
 provides the ability to perform a few hundreds queries per second, each
 query should retrieve around 100k results.
 The data is around 100k documents, around 300gb total.
 
 I started with 2 shard cluster (replicationFactor 1) and a portion of the
 data - 20 gb.
 
 I run some load-tests and see that when 100 requests are sent in one
 second,
 the average qTime is around 4 seconds, but the average total response
 time
 (measuring from sending the request to solr untill getting a response )
 reaches 20-25 seconds which is very bad.
 
 Currently I load-balance myself between the 2 solr servers (each request
 is
 sent to another server)
 
 Any advice on  which resources do I need and how my solr cluster should
 look
 like?
 More shards? more replicas? another webserver?
 
 Thanks.
 
 
 
 
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Live reload

2013-07-16 Thread Chris Hostetter

: I used the reload command to apply changes in synonyms.txt for example, but
: with the  new mechanisme https://wiki.apache.org/solr/CoreAdmin#LiveReload  
: this will not work anymore.

the Live reload doesn't affect schema.xml settings and analyziers (like 
changing stopwords or synonyms) ... when you reload, you should see your 
new synonyms.txt file loaded.

if you don't think you are seeing that behavior, then you need to provide 
a lot more details about what versin you are using, what steps you are 
trying, and what behavior you *are* seeing so that we can understand what 
porblem you might be having...  

https://wiki.apache.org/solr/UsingMailingLists

i just did a simple sanity test on the 4x branch where i ran some stuff 
through the analyzer UI screen, then changed hte synonyms file and did a 
reload and saw the changes i expected when i re-loaded the analysis page.



-Hoss


Re: Are analysers applied to each value in a multi-valued field separately?

2013-07-16 Thread Jack Krupansky
Actually, I appear to be wrong on the position limit filter - it appears to 
be relative to the string being analyzed and not the full sequence of values 
analyzed for the field.


Given this field and type:

fieldType name=text_limit_position4 class=solr.TextField 
positionIncrementGap=10

 analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LimitTokenPositionFilterFactory 
maxTokenPosition=23/

 /analyzer
/fieldType

field name=text_limit3 type=text_limit_position4
  indexed=true stored=true multiValued=true /

And this document:

curl http://localhost:8983/solr/update?commit=true; \
-H 'Content-type:application/json' -d '
[{id: doc-1,
 title: Hello World,
 text_limit4: [a1 a2 a3 a4, b1 b2 b3 b4, c1 c2 c3 c4,
 d1 d2 d3 d4, e1 e2 e3 e4, f1 f2 f3 f4]}]'


The hope was that the indexed sequence of terms would stop at c4, but the 
full values are indexed. These queries succeed:


curl http://localhost:8983/solr/select/?q=text_limit4:d1;

curl http://localhost:8983/solr/select/?q=text_limit4:f4;

And this query fails:

curl http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~65;

While this query succeeds:

curl http://localhost:8983/solr/select/?q=text_limit4:%22a4+f1%22~66;

Indicating that the position gaps of 10 are there between each value, but 
the token position limit filter doesn't trigger.


This document:

curl http://localhost:8983/solr/update?commit=true; \
-H 'Content-type:application/json' -d '
[{id: doc-1,
 title: Hello World,
 text_limit4: a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 
a18 a19 a20 a21 a22 a23 a24 a25 a26}]'


Fails on this query:

curl http://localhost:8983/solr/select/?q=text_limit4:a24;

But succeeds on this query:

curl http://localhost:8983/solr/select/?q=text_limit4:a23;

Indicating that the token position limit filter does work, but only for the 
relative position, making it not much more useful than the token count limit 
filter.


Oh well.

-- Jack Krupansky

-Original Message- 
From: Daniel Collins

Sent: Tuesday, July 16, 2013 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Are analysers applied to each value in a multi-valued field 
separately?


Self-correction, we'd need to set LimitTokenPositionFilterFactor**y to PI
+ N to give the results above because of the increment gap between values.


On 16 July 2013 17:16, Daniel Collins danwcoll...@gmail.com wrote:


Thanks Jack.

There seem to be a never ending set of FilterFactories, I keep hearing
about new ones all the time :)

Ok, I get it, so our existing code is the first N tokens of each value,
and using LimitTokenPositionFilterFactor**y with the same number would
give us the first N of the combined set of tokens, that's good to know.



On 16 July 2013 14:15, Jack Krupansky j...@basetechnology.com wrote:


Yes, each input value is analyzed separately. Solr passes each input
value to Lucene and then Lucene analyzes each.

You could use LimitTokenPositionFilterFactor**y which uses the absolute
token position - each successive analyzed value would have an incremented
position, plus the positionIncrementGap (typically 100 for text.)

-- Jack Krupansky

-Original Message- From: Daniel Collins
Sent: Tuesday, July 16, 2013 8:46 AM
To: solr-user@lucene.apache.org
Subject: Are analysers applied to each value in a multi-valued field
separately?


I'm guessing the answer is yes, but here's the background.

We index 2 separate fields, headline and body text for a document, and
then
we want to identify the top of the story which is th headline + N words
of the body (we want to weight that in scoring).

So do to that:

copyField src=headline dest=top/
copyField src=body dest=top/

And the top field has a LimitTokenCountFilterFactory appended to it to
do
the limiting.

   filter class=solr.**LimitTokenCountFilterFactory
maxTokenCount=N/

I realised that top needs to be multi-valued, which got me thinking: is
that N tokens PER VALUE of top or N tokens in total within the top
field...
The field is indexed but not stored, so its hard to determine exactly
which is being done.

Logically, I presume each value in the field is independent (and Solr 
then

just matches searches against each one), so that would suggest N is per
value?

Cheers, Daniel








Re: Doc's FunctionQuery result field in my custom SearchComponent class ?

2013-07-16 Thread Tony Mullins
OK, So thats why I cannot see the FunctionQuery fields in my
SearchComponent class.
So then question would be how can I apply my custom processing/logic to
these FunctionQuery ? Whats the ExtensionPoint in Solr for such scenarios ?

Basically I want to call termfreq() for each document and then apply the
sum to all doc's termfreq() results and show in one aggregated TermFreq
field in my query response.

Thanks.
Tony



On Tue, Jul 16, 2013 at 6:01 PM, Jack Krupansky j...@basetechnology.comwrote:

 Basically, the evaluation of function queries in the fl parameter occurs
 when the response writer is composing the document results. That's AFTER
 all of the search components are done.

 SolrReturnFields.**getTransformer() gets the DocTransformer, which is
 really a DocTransformers, and then a call to DocTransformers.transform() in
 each response writer will evaluate the embedded function queries and insert
 their values in the results as they are being written.

 -- Jack Krupansky

 -Original Message- From: Tony Mullins
 Sent: Tuesday, July 16, 2013 1:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent
 class ?


 No sorry, I am still not getting the termfreq() field in my 'doc' object.
 I do get the _version_ field in my 'doc' object which I think is
 realValue=StoredField.

 At which point termfreq() or any other FunctionQuery field becomes the part
 of doc object in Solr ? And at that point can I perform some custom logic
 and append the response ?

 Thanks.
 Tony





 On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin 
 patanachai.tangchaisin@**wizecommerce.compatanachai.tangchai...@wizecommerce.com
 wrote:

  Hi,

 I think the process of retrieving a stored field (through fl) is happens
 after SearchComponent.

 One solution: If you wrap a q params with function your score will be a
 result of the function.
 For example,

 http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=**
 termfreq%28product,%27spider%27%29wt=xmlindent=truefl=***,**score
 http://localhost:**8080/solr/collection2/**demoendpoint?q=termfreq%**
 28product,%27spider%27%29wt=**xmlindent=truefl=*,scorehttp://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29wt=xmlindent=truefl=*,score
 



 Now your score is going to be a result of termfreq(product,'spider')


 --
 Patanachai Tangchaisin



 On 07/15/2013 12:01 PM, Tony Mullins wrote:

  any help plz !!!


 On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins tonymullins...@gmail.com
 *
 *wrote:


  Please any help on how to get the value of 'freq' field in my custom

 SearchComponent ?


 http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=**
 spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%**
 27spider%27%29http://**localhost:8080/solr/**
 collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=***
 ,freq:termfreq%28product,%**27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29
 


 docstr name=id11/strstr name=typeVideo Games/strstr
 name=formatxbox 360/strstr name=productThe Amazing
 Spider-Man/strint name=popularity11/intlong
 name=_version_1439994081345273856/longint

 name=freq1/int/doc



 Here is my code

 DocList docs = rb.getResults().docList;
  DocIterator iterator = docs.iterator();
  int sumFreq = 0;
  String id = null;

  for (int i = 0; i  docs.size(); i++) {
  try {
  int docId = iterator.nextDoc();

 // Document doc = searcher.doc(docId, fieldSet);
  Document doc = searcher.doc(docId);

 In doc object I can see the schema fields like 'id', 'type','format'
 etc.
 but I cannot find the field 'freq' which I needed. Is there any way to
 get
 the FunctionQuery fields in doc object ?

 Thanks,
 Tony



 On Mon, Jul 15, 2013 at 1:16 PM, Tony Mullins tonymullins...@gmail.com
 
 **wrote:

  Hi,


 I have extended Solr's SearchComonent class and I am iterating through
 all the docs in ResponseBuilder in @overrider Process() method.

 Here I want to get the value of FucntionQuery result but in Document
 object I am only seeing the standard field of document not the
 FucntionQuery result.

 This is my query


 http://localhost:8080/solr/collection2/demoendpoint?q=**http://localhost:8080/solr/**collection2/demoendpoint?q=**
 spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%**
 27spider%27%29http://**localhost:8080/solr/**
 collection2/demoendpoint?q=**spiderwt=xmlindent=truefl=***
 ,freq:termfreq%28product,%**27spider%27%29http://localhost:8080/solr/collection2/demoendpoint?q=spiderwt=xmlindent=truefl=*,freq:termfreq%28product,%27spider%27%29
 


 Result of above query in browser shows me that 'freq' is part of doc
 but its not there in Document object in 

Highlighting externally stored text

2013-07-16 Thread JohnRodey
Does anyone know if Issue SOLR-1397 (It should be possible to highlight
external text )  is actively being worked by chance?  Looks like the last
update was May 2012.
https://issues.apache.org/jira/browse/SOLR-1397

I'm trying to find a way to best highlight search results even though those
results are not stored in my index.  Has anyone been successful in reusing
the SOLR highlighting logic on non-stored data?  Does anyone know if there
any other third party libraries that can do this for me until 1397 is
formally released?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to change extracted directory

2013-07-16 Thread Shawn Heisey

On 7/16/2013 2:02 AM, wolbi wrote:

As I said, if I change it in context.xml it works... but the question is...
how to make it from commandline, without modyfing config files.
Thanks


Take it out of the config file.

Thanks,
Shawn



Re: solr 4.3.1 Installation

2013-07-16 Thread Sandeep Gupta
This problem looks to me because of solr logging ...
see below detail description (taken one of the mail thread)

-
Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file,
so you need to put them in your classpath.  Jarfiles are included in the
example, in example/lib/ext, and those jarfiles set up logging to use
log4j, a much more flexible logging framework than JDK logging.

JDK logging is typically set up with a file called logging.properties,
which I think you must use a system property to configure.  You aren't
using JDK logging, you are using log4j, which uses a file called
log4j.properties.

http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty




On Tue, Jul 16, 2013 at 6:28 PM, Sujatha Arun suja.a...@gmail.com wrote:

 Hi ,

 We have been using solr 3.6.1 .Recently  downloaded the solr 4.3.1  version
  and installed the same  as multicore setup as follows

 Folder Structure
 solr.war
 solr
  conf
core0
 core1
 solr.xml

 Created the context fragment xml file in tomcat/conf/catalina/localhost
 which refers to the solr.war file and the solr home folder

 copied the muticore conf folder without the zoo.cfg file

 I get the following error and admin page does not load
 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
 SEVERE: Error filterStart
 16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
 SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
 16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig
 checkResources
 INFO: Undeploying context [/solr_4.3.1]
 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
 SEVERE: Error filterStart
 16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
 SEVERE: Context [/solr_4.3.1] startup failed due to previous errors


 Please let me know what I am missing If i need to install this with the
 default multicore setup without the cloud .Thanks

 Regards
 Sujatha



Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-16 Thread Roman Chyla
Erick,

I wasn't sure this issue is important, so I wanted first solicit some
feedback. You and Otis expressed interest, and I could create the JIRA -
however, as Alexandre, points out, the SOLR-1913 seems similar (actually,
closer to the Otis request to have the elasticsearch named filter) but the
SOLR-1913 was created in 2010 and is not integrated yet, so I am wondering
whether this new feature (somewhat overlapping, but still different from
SOLR-1913) is something people would really want and the effort on the JIRA
is well spent. What's your view?

Thanks,

  roman




On Tue, Jul 16, 2013 at 8:23 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Is that this one: https://issues.apache.org/jira/browse/SOLR-1913 ?

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Jul 16, 2013 at 8:01 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  Roman:
 
  Did this ever make into a JIRA? Somehow I missed it if it did, and this
  would
  be pretty cool
 
  Erick
 
  On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
   On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com
  wrote:
  
   Hello Erick,
  
Join performance is most sensitive to the number of values
in the field being joined on. So if you have lots and lots of
distinct values in the corpus, join performance will be affected.
   Yep, we have a list of unique Id's that we get by first searching for
   records
   where loggedInUser IS IN (userIDs)
   This corpus is stored in memory I suppose? (not a problem) and then
 the
   bottleneck is to match this huge set with the core where I'm
 searching?
  
   Somewhere in maillist archive people were talking about external list
  of
   Solr unique IDs
   but didn't find if there is a solution.
   Back in 2010 Yonik posted a comment:
   http://find.searchhub.org/document/363a4952446b3cd#363a4952446b3cd
  
  
   sorry, haven't the previous thread in its entirety, but few weeks back
  that
   Yonik's proposal got implemented, it seems ;)
  
  
 
 http://search-lucene.com/m/Fa3Dg14mqoj/bitsetsubj=Re+Solr+large+boolean+filter
  
   You could use this to send very large bitset filter (which can be
   translated into any integers, if you can come up with a mapping
  function).
  
   roman
  
  
  
bq: I suppose the delete/reindex approach will not change soon
There is ongoing work (search the JIRA for Stacked Segments)
   Ah, ok, I was feeling it affects the architecture, ok, now the only
  hope is
   Pseudo-Joins ))
  
One way to deal with this is to implement a post filter, sometimes
   called
a no cache filter.
   thanks, will have a look, but as you describe it, it's not the best
  option.
  
   The approach
   too many documents, man. Please refine your query. Partial results
  below
   means faceting will not work correctly?
  
   ... I have in mind a hybrid approach, comments welcome:
   Most of the time users are not searching, but browsing content, so our
   virtual filesystem stored in SOLR will use only the index with the
 Id
  of
   the file and the list of users that have access to it. i.e. not
 touching
   the fulltext index at all.
  
   Files may have metadata (EXIF info for images for ex) that we'd like
 to
   filter by, calculate facets.
   Meta will be stored in both indexes.
  
   In case of a fulltext query:
   1. search FT index (the fulltext index), get only the number of search
   results, let it be Rf
   2. search DAC index (the index with permissions), get number of search
   results, let it be Rd
  
   let maxR be the maximum size of the corpus for the pseudo-join.
   *That was actually my question: what is a reasonable number? 10, 100,
  1000
   ?
   *
  
   if (Rf  maxR) or (Rd  maxR) then use the smaller corpus to join onto
  the
   second one.
   this happens when (only a few documents contains the search query) OR
  (user
   has access to a small number of files).
  
   In case none of these happens, we can use the
   too many documents, man. Please refine your query. Partial results
  below
   but first searching the FT index, because we want relevant results
  first.
  
   What do you think?
  
   Regards,
   Oleg
  
  
  
  
   On Sun, Jul 14, 2013 at 7:42 PM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
  
Join performance is most sensitive to the number of values
in the field being joined on. So if you have lots and lots of
distinct values in the corpus, join performance will be affected.
   
bq: I suppose the delete/reindex approach will not change soon
   
There is ongoing work (search the JIRA for Stacked Segments)
on actually doing something about this, but it's been under
   consideration
for at least 3 years so your guess is as good as 

Re: Live reload

2013-07-16 Thread Alexandre Rafalovitch
Are you using synonyms during indexing or during query only? If during
indexing, the reloading by itself will not change what was stored - you
need to fully reindex as well.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 16, 2013 at 7:46 AM, O. Klein kl...@octoweb.nl wrote:

 I used the reload command to apply changes in synonyms.txt for example, but
 with the  new mechanisme 
 https://wiki.apache.org/solr/CoreAdmin#LiveReload
 this will not work anymore.

 Is there another way to reload config files instead of restarting Solr?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Live-reload-tp4078318.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Usage of luceneMatchVersion when upgrading from solr 3.6 to solr 4.3

2013-07-16 Thread Zhang, Lisheng
Hi,
 
We are upgrading solr from 3.6 to 4.3, but we have a large amount of indexed 
data and could not
afford to to reindex all once.
 
We wish solr 4.3 could do the following:
 
1/ still able to search on solr 3.6 indexed data
2/ whenever indexing new document, convert to 4.3 format (may not happen all 
once)
 
In this case, should we use LUCENE_36 or LUCENE_43 for luceneMatchVersion (it 
is suggested
that we should reindex all data if using LUCENE_43, so I think we should use 
LUCENE_36, since
we cannot reindex all once, true)?
 
Thanks very much for helps, Lisheng
 
 


Re: Live reload

2013-07-16 Thread O. Klein
My bad. I did some more testing as well and could not replicate the behavior.

Reloading synonyms works fine with a core reload.



Chris Hostetter-3 wrote
 : I used the reload command to apply changes in synonyms.txt for example,
 but
 : with the  new mechanisme
 lt;https://wiki.apache.org/solr/CoreAdmin#LiveReloadgt;  
 : this will not work anymore.
 
 the Live reload doesn't affect schema.xml settings and analyziers (like 
 changing stopwords or synonyms) ... when you reload, you should see your 
 new synonyms.txt file loaded.
 
 if you don't think you are seeing that behavior, then you need to provide 
 a lot more details about what versin you are using, what steps you are 
 trying, and what behavior you *are* seeing so that we can understand what 
 porblem you might be having...  
 
 https://wiki.apache.org/solr/UsingMailingLists
 
 i just did a simple sanity test on the 4x branch where i ran some stuff 
 through the analyzer UI screen, then changed hte synonyms file and did a 
 reload and saw the changes i expected when i re-loaded the analysis page.
 
 
 
 -Hoss





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Live-reload-tp4078318p4078400.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
Well, I think this is slightly too categorical - a range query on a
substring can be thought of as a simple range query. So, for example the
following query:

lucene 1*

becomes behind the scenes: lucene (10|11|12|13|14|1abcd)

the issue there is that it is a string range, but it is a range query - it
just has to be indexed in a clever way

So, Marcin, you still have quite a few options besides the strict boolean
query model

1. have a special tokenizer chain which creates one token out of these
groups (eg. some text prefix_1) and search for some text prefix_* [and
do some post-filtering if necessary]
2. another version, using regex /some text (1|2|3...)/ - you got the idea
3. construct the lucene multi-term range query automatically, in your
qparser - to produce a phrase query lucene (10|11|12|13|14)
4. use payloads to index your integer at the position of some text and
then retrieve only some text where the payload is in range x-y - an
example is here, look at getPayloadQuery()
https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
but this is more complex situation and if you google, you will find a
better description
5. use a qparser that is able to handle nested search and analysis at the
same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i
know about a parser that can handle this and i invite others to check it
out (yeah, JIRA tickets need reviewers ;-))
https://issues.apache.org/jira/browse/LUCENE-5014

there might be others i forgot, but it is certainly doable; but as Jack
points out, you may want to stop for a moment to reflect whether it is
necessary

HTH,

  roman


On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.comwrote:

 Sorry, but you are basically misusing Solr (and multivalued fields),
 trying to take a shortcut to avoid a proper data model.

 To properly use Solr, you need to put each of these multivalued field
 values in a separate Solr document, with a text field and a value
 field. Then, you can query:

text:some text AND value:[min-value TO max-value]

 Exactly how you should restructure your data model is dependent on all of
 your other requirements.

 You may be able to simply flatten your data.

 You may be able to use a simple join operation.

 Or, maybe you need to do a multi-step query operation if you data is
 sufficiently complex.

 If you want to keep your multivalued field in its current form for display
 purposes or keyword search, or exact match search, fine, but your stated
 goal is inconsistent with the Semantics of Solr and Lucene.

 To be crystal clear, there is no such thing as a range query on a
 substring in Solr or Lucene.

 -- Jack Krupansky

 -Original Message- From: Marcin Rzewucki
 Sent: Tuesday, July 16, 2013 5:13 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Range query on a substring.


 By multivalued I meant an array of values. For example:
 arr name=myfield
  strtext1 (X)/str
  strtext2 (Y)/str
 /arr

 I'd like to avoid spliting it as you propose. I have 2.3mn collection with
 pretty large records (few hundreds fields and more per record). Duplicating
 them would impact performance.

 Regards.



 On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote:

  Ah, you mean something like this:
 record:
 Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
 Id=11, text =  this is a text N1 (W), another text N2 (Q), third text
 (M)

 and you need to search for: text N1 and X  B ?
 How big is the core? the first thing that comes to my mind, again, at
 indexing level,
 split the text into pieces and index it in solr like this:

 record_id | text  | value
 10   | text N1 | X
 10   | text N2 | Y
 10   | text N3 | Z

 does it help?



 On Tue, Jul 16, 2013 at 10:51 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:

  Hi Oleg,
  It's a multivalued field and it won't be easier to query when I split
 this
  field into text and numbers. I may get wrong results.
 
  Regards.
 
 
  On 16 July 2013 09:35, Oleg Burlaca oburl...@gmail.com wrote:
 
   IMHO the number(s) should be extracted and stored in separate columns
 in
   SOLR at indexing time.
  
   --
   Oleg
  
  
   On Tue, Jul 16, 2013 at 10:12 AM, Marcin Rzewucki 
 mrzewu...@gmail.com
   wrote:
  
Hi,
   
I have a problem (wonder if it is possible to solve it at all) with
 the
following query. There are documents with a field which contains a
 text
   and
a number in brackets, eg.
   
myfield: this is a text (number)
   
There might be some other documents with the same text but different
   number
in brackets.
I'd like to find documents with the given text say this is a text
 and
number between A and B. Is it possible in Solr ? Any ideas ?
   
Kind regards.
   
  
 





Re: Solr is not responding on deployment in tomcat

2013-07-16 Thread Per Newgro

Thanks Eric,

i've configured both to use 8080 (For wicket this is standard :-)).

Do i have to assign a different port to solr if i use both webapps in 
the same container?

Btw. the contextpath for my wicket app is /*
Could that be a problem to?

Per

Am 15.07.2013 17:12, schrieb Erick Erickson:

Sounds like Wicket and Solr are using the same port(s)...

If you start Wicket first then look at the Solr logs, you might
see some message about port already in use or some such.

If this is SolrCloud, there are also the ZooKeeper ports to
wonder about.

Best
Erick

On Mon, Jul 15, 2013 at 6:49 AM, Per Newgro per.new...@gmx.ch wrote:

Hi,

maybe someone here can help me with my solr-4.3.1 issue.

I've successful deployed the solr.war on a tomcat7 instance.
Starting the tomcat with only the solr.war deployed - works nicely.
I can see the admin interface and logs are clean.

If i
deploy my wicket-spring-data-solr based app (using the HttpSolrServer)
after the solr app
without restarting the tomcat
= all is fine to.

I've implemented a ping to see if server is up.

code
 private void waitUntilSolrIsAvailable(int i) {
 if (i == 0) {
 logger.info(Check solr state...);
 }
 if (i  5) {
 throw new RuntimeException(Solr is not avaliable after 
more than 25 secs. Going down now.);
 }
 if (i  0) {
 try {
 logger.info(Wait for solr to get alive.);
 Thread.currentThread().wait(5000);
 } catch (InterruptedException e) {
 throw new RuntimeException(e);
 }
 }
 try {
 i++;
 SolrPingResponse r = solrServer.ping();
 if (r.getStatus()  0) {
 waitUntilSolrIsAvailable(i);
 }
 logger.info(Solr is alive.);
 } catch (SolrServerException | IOException e) {
 throw new RuntimeException(e);
 }
 }
/code

Here i can see log
log
54295 [localhost-startStop-2] INFO  org.apache.wicket.Application  – 
[wicket.project] init: Wicket extensions initializer
INFO  - 2013-07-15 12:07:45.261; 
de.company.service.SolrServerInitializationService; Check solr state...
54505 [localhost-startStop-2] INFO  
de.company.service.SolrServerInitializationService  – Check solr state...
INFO  - 2013-07-15 12:07:45.768; org.apache.solr.core.SolrCore; [collection1] 
webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 
QTime=20
55012 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  – [collection1] 
webapp=/solr path=/admin/ping params={wt=javabinversion=2} hits=0 status=0 
QTime=20
INFO  - 2013-07-15 12:07:45.770; org.apache.solr.core.SolrCore; [collection1] 
webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22
55014 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  – [collection1] 
webapp=/solr path=/admin/ping params={wt=javabinversion=2} status=0 QTime=22
INFO  - 2013-07-15 12:07:45.854; 
de.company.service.SolrServerInitializationService; Solr is alive.
55098 [localhost-startStop-2] INFO  
de.company.service.SolrServerInitializationService  – Solr is alive.
/log

But if i
restart the tomcat
with both webapps (solr and wicket)
the solr is not responding on the ping request.

log
INFO  - 2013-07-15 12:02:27.634; org.apache.wicket.Application; 
[wicket.project] init: Wicket extensions initializer
11932 [localhost-startStop-1] INFO  org.apache.wicket.Application  – 
[wicket.project] init: Wicket extensions initializer
INFO  - 2013-07-15 12:02:27.787; 
de.company.service.SolrServerInitializationService; Check solr state...
12085 [localhost-startStop-1] INFO  
de.company.service.SolrServerInitializationService  – Check solr state...
/log

What could that be or how can i get infos where this is stopping?

Thanks for your support
Per




Re: solr 4.3.1 Installation

2013-07-16 Thread Sujatha Arun
Thanks Sandeep,that fixed it.

Regards,
Sujatha


On Tue, Jul 16, 2013 at 10:41 PM, Sandeep Gupta gupta...@gmail.com wrote:

 This problem looks to me because of solr logging ...
 see below detail description (taken one of the mail thread)


 -
 Solr 4.3.0 and later does not have ANY slf4j jarfiles in the .war file,
 so you need to put them in your classpath.  Jarfiles are included in the
 example, in example/lib/ext, and those jarfiles set up logging to use
 log4j, a much more flexible logging framework than JDK logging.

 JDK logging is typically set up with a file called logging.properties,
 which I think you must use a system property to configure.  You aren't
 using JDK logging, you are using log4j, which uses a file called
 log4j.properties.


 http://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty




 On Tue, Jul 16, 2013 at 6:28 PM, Sujatha Arun suja.a...@gmail.com wrote:

  Hi ,
 
  We have been using solr 3.6.1 .Recently  downloaded the solr 4.3.1
  version
   and installed the same  as multicore setup as follows
 
  Folder Structure
  solr.war
  solr
   conf
 core0
  core1
  solr.xml
 
  Created the context fragment xml file in tomcat/conf/catalina/localhost
  which refers to the solr.war file and the solr home folder
 
  copied the muticore conf folder without the zoo.cfg file
 
  I get the following error and admin page does not load
  16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
  SEVERE: Error filterStart
  16 Jul, 2013 11:36:09 PM org.apache.catalina.core.StandardContext start
  SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
  16 Jul, 2013 11:36:39 PM org.apache.catalina.startup.HostConfig
  checkResources
  INFO: Undeploying context [/solr_4.3.1]
  16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
  SEVERE: Error filterStart
  16 Jul, 2013 11:36:39 PM org.apache.catalina.core.StandardContext start
  SEVERE: Context [/solr_4.3.1] startup failed due to previous errors
 
 
  Please let me know what I am missing If i need to install this with the
  default multicore setup without the cloud .Thanks
 
  Regards
  Sujatha
 



Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Dmitry Kan
Thanks Alexandre,

Well, the initial question was, whether it is possible to altogether avoid
dealing with collections (extra layer, longer url). But it seems this is an
internal new feature of solr 4 generation. In solr 3 it was just a core,
which could be avoided if no solr.xml was found.

With this release my solr terminology has transformed into having some
ambiguous words (collection and core) referring to the same thing. I'm not
even sure, what shard is nowadays :)




On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 If you only have one collection and no Solr cloud, then don't use solr.xml
 at all. It will automatically assume 'collection1' as a name.

 If you do want to have some control (shards, etc), do not include the
 optional parameters you do not need. See example here:

 http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html

 You don't even need defaultCoreName attribute, if you are happy to always
 include core name in the URL.

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com wrote:

  Sorry, hit send too fast..
 
  picking up:
 
  from the answer by Jayendra on the link, collections and cores are the
 same
  thing. Same is seconded by the config:
 
cores adminPath=/admin/cores defaultCoreName=collection1
  host=${host:} hostPort=${jetty.port:8983}
  hostContext=${hostContext:solr}
  zkClientTimeout=${zkClientTimeout:15000}
  core name=collection1 instanceDir=. /
/cores
 
  we basically define cores.
 
  We have a plain {frontend_solr, shards} setup with solr 3.4 and were
  thinking of starting off with it initially in solr 4. In solr 4: can one
  get by without using collections = cores?
 
  We also don't plan on using SolrCloud at the moment. So from our
 standpoint
  the solr4 configuration looks more complicated, than that of solr 3.4.
 Are
  there any benefits of such a setup for non SolrCloud users?
 
  Thanks,
 
  Dmitry
 
 
 
  On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com
 wrote:
 
   Hello list,
  
   Following the answer by Jaendra here:
  
  
  
 
 http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
  
 



Re: How to use joins in solr 4.3.1

2013-07-16 Thread Utkarsh Sengar
Looks like the JoinQParserPlugin is throwing an NPE.
Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key
to=merchantId fromIndex=merchant}

84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore  –
java.lang.NullPointerException
at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)

84343350 [qtp2012387303-16] INFO  org.apache.solr.core.SolrCore  –
[location] webapp=/solr path=/select
params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true}
status=500 QTime=6
84343351 [qtp2012387303-16] ERROR
org.apache.solr.servlet.SolrDispatchFilter  –
null:java.lang.NullPointerException
at org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at

Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Alexandre Rafalovitch
Search this mailing list and you will find a very long discussion about the
terminology and confusion around it.My contribution to that was the crude
picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help.

If you don't want longer URL, do use solr.xml and use @adminPath and
@defaultCoreName
parameters. But you don't need the rest.

Regards,
   Alex.


Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Jul 16, 2013 at 2:30 PM, Dmitry Kan solrexp...@gmail.com wrote:

 Thanks Alexandre,

 Well, the initial question was, whether it is possible to altogether avoid
 dealing with collections (extra layer, longer url). But it seems this is an
 internal new feature of solr 4 generation. In solr 3 it was just a core,
 which could be avoided if no solr.xml was found.

 With this release my solr terminology has transformed into having some
 ambiguous words (collection and core) referring to the same thing. I'm not
 even sure, what shard is nowadays :)




 On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch
 arafa...@gmail.comwrote:

  If you only have one collection and no Solr cloud, then don't use
 solr.xml
  at all. It will automatically assume 'collection1' as a name.
 
  If you do want to have some control (shards, etc), do not include the
  optional parameters you do not need. See example here:
 
 
 http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html
 
  You don't even need defaultCoreName attribute, if you are happy to always
  include core name in the URL.
 
  Regards,
 Alex.
 
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all at
  once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
 
 
  On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com
 wrote:
 
   Sorry, hit send too fast..
  
   picking up:
  
   from the answer by Jayendra on the link, collections and cores are the
  same
   thing. Same is seconded by the config:
  
 cores adminPath=/admin/cores defaultCoreName=collection1
   host=${host:} hostPort=${jetty.port:8983}
   hostContext=${hostContext:solr}
   zkClientTimeout=${zkClientTimeout:15000}
   core name=collection1 instanceDir=. /
 /cores
  
   we basically define cores.
  
   We have a plain {frontend_solr, shards} setup with solr 3.4 and were
   thinking of starting off with it initially in solr 4. In solr 4: can
 one
   get by without using collections = cores?
  
   We also don't plan on using SolrCloud at the moment. So from our
  standpoint
   the solr4 configuration looks more complicated, than that of solr 3.4.
  Are
   there any benefits of such a setup for non SolrCloud users?
  
   Thanks,
  
   Dmitry
  
  
  
   On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com
  wrote:
  
Hello list,
   
Following the answer by Jaendra here:
   
   
   
  
 
 http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core
   
  
 



Re: How to use joins in solr 4.3.1

2013-07-16 Thread Utkarsh Sengar
Found this post:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201302.mbox/%3CCAB_8Yd82aqq=oY6dBRmVjG7gvBBewmkZGF9V=fpne4xgkbu...@mail.gmail.com%3E

And based on the answer, I modified my query: localhost:8983/solr/location/
select?fq={!join from=key to=merchantId fromIndex=merchant}*:*

I don't see any errors, but my original problem still persists, no
documents are returned.
The two fields on which I am trying to join is:

Merchant: field name=merchantId type=string   indexed=true
stored=true  multiValued=false /
Location:  field name=merchantId type=string   indexed=false
stored=true  multiValued=false /

Thanks,
-Utkarsh


On Tue, Jul 16, 2013 at 11:39 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Looks like the JoinQParserPlugin is throwing an NPE.
 Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key
 to=merchantId fromIndex=merchant}

 84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore  –
 java.lang.NullPointerException
 at
 org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
 at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:662)

 84343350 [qtp2012387303-16] INFO  org.apache.solr.core.SolrCore  –
 [location] webapp=/solr path=/select
 params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true}
 status=500 QTime=6
 84343351 [qtp2012387303-16] ERROR
 org.apache.solr.servlet.SolrDispatchFilter  –
 null:java.lang.NullPointerException
 at
 org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
 at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
 at
 

Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Shawn Heisey

On 7/16/2013 12:41 PM, Alexandre Rafalovitch wrote:

Search this mailing list and you will find a very long discussion about the
terminology and confusion around it.My contribution to that was the crude
picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help.

If you don't want longer URL, do use solr.xml and use @adminPath and
@defaultCoreName
parameters. But you don't need the rest.


I'm relatively sure that defaultCoreName isn't there if you use the new 
core discovery mode that is default in the 4.4 example.  This new mode 
will be the only option in 5.0.  The old mode will continue to be 
supported throughout all 4.x versions.


I think getting rid of defaultCoreName is the right move - Solr has been 
multicore in the standard example for quite some time.  Accessing Solr 
without a corename in the URL is a source of confusion for users when 
they venture outside the collection1 core that comes with the default 
example.


IMHO, the additional capability and confusion inherent with SolrCloud 
makes it even more important that the user include a collection/core 
name when making their request.


Thanks,
Shawn



SolrCloud Zookeeper SaslClient

2013-07-16 Thread kowish.adamosh
Hi,

Is there any documentation of how to configure SolrCloud Zookeeper using
SASL (on JBOSS 5). When I start SolrCloud on Jboss 5 I see WARN:

/2013-07-16 21:38:17,425 INFO 
[org.apache.solr.common.cloud.ConnectionManager:157] (main) Waiting for
client to connect to ZooKeeper
2013-07-16 21:38:17,437 WARN 
[org.apache.zookeeper.client.ZooKeeperSaslClient:437]
(main-SendThread(localhost:2181)) Could not login: the client is being asked
for a password, but the Zookeeper client code does not currently support
obtai
ning a password from the user. Make sure that the client is configured to
use a ticket cache (using the JAAS configuration setting
'useTicketCache=true)' and restart the client. If you still get this message
after that, the TGT in the tick
et cache has expired and must be manually refreshed. To do so, first
determine if you are using a password or a keytab. If the former, run kinit
in a Unix shell in the environment of the user who is running this Zookeeper
client using the 
command 'kinit princ' (where princ is the name of the client's Kerberos
principal). If the latter, do 'kinit -k -t keytab princ' (where princ
is the name of the Kerberos principal, and keytab is the location of the
keytab file)
. After manually refreshing your cache, restart this client. If you continue
to see this message after manually refreshing your cache, ensure that your
KDC host's clock is in sync with this host's clock.
2013-07-16 21:38:17,438 WARN  [org.apache.zookeeper.ClientCnxn:949]
(main-SendThread(localhost:2181)) SASL configuration failed:
javax.security.auth.login.FailedLoginException: Password Incorrect/Password
Required Will continue connection 
to Zookeeper server without SASL authentication, if Zookeeper server allows
it./

Any example or tutorial? I'd like to configure it to be secured :-)

Kowish




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-SaslClient-tp4078447.html
Sent from the Solr - User mailing list archive at Nabble.com.


JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread neoman
Hello Everyone,
We are using solrcloud with Tomcat in our production environment.  
Here is our configuration.
solr-4.0.0
JVM 1.6.0_25

The JVM keeps crashing everyday with the following error. I think it is
happening while we try index the data with solrj APIs.

INFO: [aq-core] webapp=/solr path=/update
params={distrib.from=http://solr03-prod:8080/solr/aq-core/update.distrib=TOLEADERwt=javabinversion=2}
status=0 QTime=1 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xfd7ffadac771, pid=2411, tid=33662
#
# JRE version: 6.0_25-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
solaris-amd64 compressed oops)
# Problematic frame:
# J 
org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats;
#
# An error report file with more information is saved as:
# /opt/tomcat/hs_err_pid2411.log
Jul 16, 2013 6:27:07 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp



Instructions: (pc=0xfd7ffadac771)
0xfd7ffadac751:   89 4c 24 30 4c 89 44 24 28 4c 89 54 24 18 44 89
0xfd7ffadac761:   5c 24 20 4c 8b 57 10 4d 63 d9 49 8b ca 49 03 cb
0xfd7ffadac771:   44 0f be 01 45 8b d9 41 ff c3 44 89 5f 18 45 85
0xfd7ffadac781:   c0 0f 8c b0 05 00 00 45 8b d0 45 8b da 41 d1 eb 

Register to memory mapping:

RAX=0x14008cf2 is an unknown value
RBX=
[error occurred during error reporting (printing register info), id 0xb]

Stack: [0xfd7de4eff000,0xfd7de4fff000],  sp=0xfd7de4ffe140, 
free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
J 
org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats;

Please let me know if anyone has seen this before. Any input is appreciated. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Dmitry Kan
Thanks Alexandre, I think I have followed that discussion, there was
another one AFAIR on the dev list.

On your diagram, am I guessing it correctly, that shard1 and shard2 inside
a collection would at least share the same schema?


On Tue, Jul 16, 2013 at 9:41 PM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Search this mailing list and you will find a very long discussion about the
 terminology and confusion around it.My contribution to that was the crude
 picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help.

 If you don't want longer URL, do use solr.xml and use @adminPath and
 @defaultCoreName
 parameters. But you don't need the rest.

 Regards,
Alex.


 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Tue, Jul 16, 2013 at 2:30 PM, Dmitry Kan solrexp...@gmail.com wrote:

  Thanks Alexandre,
 
  Well, the initial question was, whether it is possible to altogether
 avoid
  dealing with collections (extra layer, longer url). But it seems this is
 an
  internal new feature of solr 4 generation. In solr 3 it was just a core,
  which could be avoided if no solr.xml was found.
 
  With this release my solr terminology has transformed into having some
  ambiguous words (collection and core) referring to the same thing. I'm
 not
  even sure, what shard is nowadays :)
 
 
 
 
  On Tue, Jul 16, 2013 at 3:57 PM, Alexandre Rafalovitch
  arafa...@gmail.comwrote:
 
   If you only have one collection and no Solr cloud, then don't use
  solr.xml
   at all. It will automatically assume 'collection1' as a name.
  
   If you do want to have some control (shards, etc), do not include the
   optional parameters you do not need. See example here:
  
  
 
 http://my.safaribooksonline.com/book/databases/9781782164845/1dot-instant-apache-solr-for-indexing-data-how-to/ch01s02_html
  
   You don't even need defaultCoreName attribute, if you are happy to
 always
   include core name in the URL.
  
   Regards,
  Alex.
  
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all at
   once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)
  
  
   On Tue, Jul 16, 2013 at 7:28 AM, Dmitry Kan solrexp...@gmail.com
  wrote:
  
Sorry, hit send too fast..
   
picking up:
   
from the answer by Jayendra on the link, collections and cores are
 the
   same
thing. Same is seconded by the config:
   
  cores adminPath=/admin/cores defaultCoreName=collection1
host=${host:} hostPort=${jetty.port:8983}
hostContext=${hostContext:solr}
zkClientTimeout=${zkClientTimeout:15000}
core name=collection1 instanceDir=. /
  /cores
   
we basically define cores.
   
We have a plain {frontend_solr, shards} setup with solr 3.4 and were
thinking of starting off with it initially in solr 4. In solr 4: can
  one
get by without using collections = cores?
   
We also don't plan on using SolrCloud at the moment. So from our
   standpoint
the solr4 configuration looks more complicated, than that of solr
 3.4.
   Are
there any benefits of such a setup for non SolrCloud users?
   
Thanks,
   
Dmitry
   
   
   
On Tue, Jul 16, 2013 at 2:24 PM, Dmitry Kan solrexp...@gmail.com
   wrote:
   
 Hello list,

 Following the answer by Jaendra here:



   
  
 
 http://stackoverflow.com/questions/14516279/how-to-add-collections-to-solr-core

   
  
 



Re: Partial Matching in both query and field

2013-07-16 Thread James Bathgate
I figured it out for anyone finding this thread. I had to add the following
to my solrconfig.xml

luceneMatchVersionLUCENE_31/luceneMatchVersion


http://www.searchspring.net/James Bathgate*Sr. Developer*888.643.9043 ext.
 610 http://www.linkedin.com/in/bathgate


On Thu, Jul 11, 2013 at 2:47 PM, James Bathgate ja...@b7interactive.comwrote:

 1. My general process for a schema change (I know it's overkill) is delete
 the data directory, reload, index data, reload again.

 2. I'm using schema version 1.5 on Solr 3.6.2.

 schema name=SearchSpringDefault version=1.5

 3. LuceneQParser, but I've also tried dismax and edismax.

 Here's my solrQueryParser field in my schema, I think OR is correct for
 this.
 solrQueryParser defaultOperator=OR/

 James


 [image: SearchSpring | Findability Unleashed]

 James Bathgate | Sr. Developer

 Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
 www.searchspring.net   http://www.searchspring.net


 On Thu, Jul 11, 2013 at 2:29 PM, Jack Krupansky 
 j...@basetechnology.comwrote:

 A couple of possibilities:

 1. Make sure to reload the core.
 2. Check that the Solr schema version is new enough to recognize
 autoGeneratePhraseQueries.
 3. What query parser are you using?


 -- Jack Krupansky

 -Original Message- From: James Bathgate
 Sent: Thursday, July 11, 2013 5:26 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Partial Matching in both query and field

 I just noticed I pasted the wrong fieldType with the extra tokenizer not
 commented out.

fieldType name=ngram class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=**false
  analyzer type=index
tokenizer class=solr.**WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
filter class=solr.**NGramFilterFactory minGramSize=4
 maxGramSize=16/
filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  /analyzer
  analyzer type=query
tokenizer class=solr.**NGramTokenizerFactory minGramSize=4
 maxGramSize=16 /
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
filter class=solr.**PatternReplaceFilterFactory
 pattern=[^A-Za-z0-9]+ replacement= replace=all/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**PatternReplaceFilterFactory pattern=0
 replacement=o replace=all/
filter class=solr.**PatternReplaceFilterFactory pattern=1|l
 replacement=i replace=all/
filter class=solr.**RemoveDuplicatesTokenFilterFac**tory/
  /analyzer
/fieldType


 [image: SearchSpring | Findability Unleashed]

 James Bathgate | Sr. Developer

 Toll Free (888) 643-9043 x610 - Fax (719) 358-2027

 4291 Austin Bluffs Pkwy #206 | Colorado Springs, CO 80918
 www.searchspring.net   http://www.searchspring.net



 On Thu, Jul 11, 2013 at 2:15 PM, James Bathgate ja...@b7interactive.com
 **wrote:

  Jack,

 This still isn't working. I just upgraded to 3.6.2 to verify that wasn't
 the issue.

 Here's query information:

 lst name=params

 str name=debugQueryon/str
 str name=indenton/str
 str name=start0/str
 str name=q0_extrafield1_n:**20454/str
 str name=rows10/str
 str name=version2.2/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 lst name=debug
 str name=rawquerystring0_**extrafield1_n:20454/str
 str name=querystring0_**extrafield1_n:20454/str
 str name=parsedquery**PhraseQuery(0_extrafield1_n:**2o45 o454
 2o454)/str

 str name=parsedquery_toString0_**extrafield1_n:2o45 o454
 2o454/str
 lst name=explain/
 str name=QParserLuceneQParser/**str


 Here's the applicable lines from schema.xml:

 fieldType name=ngram class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=**false

   analyzer type=index
 tokenizer class=solr.**WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=**true/
 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0
 splitOnNumerics=0 preserveOriginal=0/
 filter class=solr.**LowerCaseFilterFactory/
 

Re: [solr 3.4.1] collections: meaning and necessity

2013-07-16 Thread Dmitry Kan
Hi Shawn,

Thanks for your input.

Having spent some time today figuring out the path to upgrade, I concluded
that we have been using what is (and was in solr 3 and possibly earlier)
called a core. A group of two cores (with different schemas) we (probably
mistakenly) referred to as a shard. That is, the shard was a larger
semantic unit or chunk of data that would repeat itself in configuration
along the time axis. Each shard would hold data from a particular time
period.

What's a bit confusing, is that, at least in my vocabulary, a collection is
similar to what the word group means. The confusion stems from the fact
that in a core config one defines a collection. But, if we imagine a
series of cores created with the same schema, they could be united into a
group or a collection. Although to me, as a user (if the above
explanation holds of course) a collection is an internal implementation
detail.


On Tue, Jul 16, 2013 at 9:59 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/16/2013 12:41 PM, Alexandre Rafalovitch wrote:

 Search this mailing list and you will find a very long discussion about
 the
 terminology and confusion around it.My contribution to that was the crude
 picture trying to explain it: http://bit.ly/1aqohUf . Maybe it will help.

 If you don't want longer URL, do use solr.xml and use @adminPath and
 @defaultCoreName
 parameters. But you don't need the rest.


 I'm relatively sure that defaultCoreName isn't there if you use the new
 core discovery mode that is default in the 4.4 example.  This new mode will
 be the only option in 5.0.  The old mode will continue to be supported
 throughout all 4.x versions.

 I think getting rid of defaultCoreName is the right move - Solr has been
 multicore in the standard example for quite some time.  Accessing Solr
 without a corename in the URL is a source of confusion for users when they
 venture outside the collection1 core that comes with the default example.

 IMHO, the additional capability and confusion inherent with SolrCloud
 makes it even more important that the user include a collection/core name
 when making their request.

 Thanks,
 Shawn




Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-16 Thread Lance Norskog
I don't know about jvm crashes, but it is known that the Java 6 jvm had 
various problems supporting Solr, including the 20-30 series. A lot of 
people use the final jvm release (I think 6_30).


On 07/16/2013 12:25 PM, neoman wrote:

Hello Everyone,
We are using solrcloud with Tomcat in our production environment.
Here is our configuration.
solr-4.0.0
JVM 1.6.0_25

The JVM keeps crashing everyday with the following error. I think it is
happening while we try index the data with solrj APIs.

INFO: [aq-core] webapp=/solr path=/update
params={distrib.from=http://solr03-prod:8080/solr/aq-core/update.distrib=TOLEADERwt=javabinversion=2}
status=0 QTime=1
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xfd7ffadac771, pid=2411, tid=33662
#
# JRE version: 6.0_25-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.0-b11 mixed mode
solaris-amd64 compressed oops)
# Problematic frame:
# J
org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats;
#
# An error report file with more information is saved as:
# /opt/tomcat/hs_err_pid2411.log
Jul 16, 2013 6:27:07 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false}
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp



Instructions: (pc=0xfd7ffadac771)
0xfd7ffadac751:   89 4c 24 30 4c 89 44 24 28 4c 89 54 24 18 44 89
0xfd7ffadac761:   5c 24 20 4c 8b 57 10 4d 63 d9 49 8b ca 49 03 cb
0xfd7ffadac771:   44 0f be 01 45 8b d9 41 ff c3 44 89 5f 18 45 85
0xfd7ffadac781:   c0 0f 8c b0 05 00 00 45 8b d0 45 8b da 41 d1 eb

Register to memory mapping:

RAX=0x14008cf2 is an unknown value
RBX=
[error occurred during error reporting (printing register info), id 0xb]

Stack: [0xfd7de4eff000,0xfd7de4fff000],  sp=0xfd7de4ffe140,
free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
J
org.apache.lucene.codecs.PostingsConsumer.merge(Lorg/apache/lucene/index/MergeState;Lorg/apache/lucene/index/DocsEnum;Lorg/apache/lucene/util/FixedBitSet;)Lorg/apache/lucene/codecs/TermStats;

Please let me know if anyone has seen this before. Any input is appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Highlighting externally stored text

2013-07-16 Thread Bryan Loofbourrow
 I'm trying to find a way to best highlight search results even though
 those
 results are not stored in my index.  Has anyone been successful in
reusing
 the SOLR highlighting logic on non-stored data?

I was able to do this by slightly modifying the FastVectorHighlighter so
that it returned before computing snippets, instead returning the term
match offsets in the FieldPhraseList class. Of course you need to make
sure that your files are encoded in such a way that a character always has
the same byte width.

-- Bryan


Re: Range query on a substring.

2013-07-16 Thread Marcin Rzewucki
Hi guys,

First of all, thanks for your response.

Jack: Data structure was created some time ago and this is a new
requirement in my project. I'm trying to find a solution. I wouldn't like
to split multivalued field into N similar records varying in this
particular field only. That could impact performance and imply more changes
in backend architecture as well. I'd prefer to create yet another
collection and use pseudo-joins...

Roman: Your ideas seem to be much closer to what I'm looking for. However,
the following syntax: text (1|2|3) does not work for me. Are you sure it
works like OR inside a regexp ?
By the way: Honestly, I have one more requirement for which I would have to
extend Solr query syntax. Basically, it should be possible to do some math
on few fields and do range query on the result (without indexing it,
because a combination of different fields is allowed). I'd like to spend
some time on ANTLR and the new way of parsing you mentioned. I will let you
know if it was useful for me. Thanks.

Kind regards.


On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote:

 Well, I think this is slightly too categorical - a range query on a
 substring can be thought of as a simple range query. So, for example the
 following query:

 lucene 1*

 becomes behind the scenes: lucene (10|11|12|13|14|1abcd)

 the issue there is that it is a string range, but it is a range query - it
 just has to be indexed in a clever way

 So, Marcin, you still have quite a few options besides the strict boolean
 query model

 1. have a special tokenizer chain which creates one token out of these
 groups (eg. some text prefix_1) and search for some text prefix_* [and
 do some post-filtering if necessary]
 2. another version, using regex /some text (1|2|3...)/ - you got the idea
 3. construct the lucene multi-term range query automatically, in your
 qparser - to produce a phrase query lucene (10|11|12|13|14)
 4. use payloads to index your integer at the position of some text and
 then retrieve only some text where the payload is in range x-y - an
 example is here, look at getPayloadQuery()

 https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
 but this is more complex situation and if you google, you will find a
 better description
 5. use a qparser that is able to handle nested search and analysis at the
 same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i
 know about a parser that can handle this and i invite others to check it
 out (yeah, JIRA tickets need reviewers ;-))
 https://issues.apache.org/jira/browse/LUCENE-5014

 there might be others i forgot, but it is certainly doable; but as Jack
 points out, you may want to stop for a moment to reflect whether it is
 necessary

 HTH,

   roman


 On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  Sorry, but you are basically misusing Solr (and multivalued fields),
  trying to take a shortcut to avoid a proper data model.
 
  To properly use Solr, you need to put each of these multivalued field
  values in a separate Solr document, with a text field and a value
  field. Then, you can query:
 
 text:some text AND value:[min-value TO max-value]
 
  Exactly how you should restructure your data model is dependent on all of
  your other requirements.
 
  You may be able to simply flatten your data.
 
  You may be able to use a simple join operation.
 
  Or, maybe you need to do a multi-step query operation if you data is
  sufficiently complex.
 
  If you want to keep your multivalued field in its current form for
 display
  purposes or keyword search, or exact match search, fine, but your stated
  goal is inconsistent with the Semantics of Solr and Lucene.
 
  To be crystal clear, there is no such thing as a range query on a
  substring in Solr or Lucene.
 
  -- Jack Krupansky
 
  -Original Message- From: Marcin Rzewucki
  Sent: Tuesday, July 16, 2013 5:13 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Range query on a substring.
 
 
  By multivalued I meant an array of values. For example:
  arr name=myfield
   strtext1 (X)/str
   strtext2 (Y)/str
  /arr
 
  I'd like to avoid spliting it as you propose. I have 2.3mn collection
 with
  pretty large records (few hundreds fields and more per record).
 Duplicating
  them would impact performance.
 
  Regards.
 
 
 
  On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote:
 
   Ah, you mean something like this:
  record:
  Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
  Id=11, text =  this is a text N1 (W), another text N2 (Q), third text
  (M)
 
  and you need to search for: text N1 and X  B ?
  How big is the core? the first thing that comes to my mind, again, at
  indexing level,
  split the text into pieces and index it in solr like this:
 
  record_id | text  | value
  10   | text N1 | X
  10   | text N2 | Y
  10   | text 

Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-16 Thread Marcin Rzewucki
Unloading a core is the known way to unregister a solr node in zookeeper
(and not use for further querying). It works for me. If you didn't do that
like this, unused nodes may remain in the cluster state and Solr may try to
use them without a success. I'd suggest to start some machine with the old
name, run solr, join the cluster for a while, unload a core to unregister
it from the cluster and shutdown host at the end. This way you could have
clear cluster state.



On 16 July 2013 14:41, Luis Carlos Guerrero Covo
lcguerreroc...@gmail.comwrote:

 Thanks, I was actually asking about deleting nodes from the cluster state
 not cores, unless you can unload cores specific to an already offline node
 from zookeeper.


 On Tue, Jul 16, 2013 at 1:55 AM, Marcin Rzewucki mrzewu...@gmail.com
 wrote:

  Hi,
 
  You should use CoreAdmin API (or Solr Admin page) and UNLOAD unneeded
  cores. This will unregister them from the zookeeper (cluster state will
 be
  updated), so they won't be used for querying any longer. Solrcloud
 restart
  is not needed in this case.
 
  Regards.
 
 
  On 16 July 2013 06:18, Ali, Saqib docbook@gmail.com wrote:
 
   Hello Luis,
  
   I don't think that is possible. If you delete clusterstate.json from
   zookeeper, you will need to restart the nodes.. I could be very
 wrong
   about this
  
   Saqib
  
  
   On Mon, Jul 15, 2013 at 8:50 PM, Luis Carlos Guerrero Covo 
   lcguerreroc...@gmail.com wrote:
  
I know that you can clear zookeeper's data directoy using the CLI
 with
   the
clear command, I just want to know if its possible to update the
   cluster's
state without wiping everything out. Anyone have any
 ideas/suggestions?
   
   
On Mon, Jul 15, 2013 at 11:21 AM, Luis Carlos Guerrero Covo 
lcguerreroc...@gmail.com wrote:
   
 Hi,

 Is there an easy way to clear zookeeper of all offline solr nodes
   without
 restarting the cluster? We are having some stability issues and we
   think
it
 maybe due to the leader querying old offline nodes.

 thank you,

 Luis Guerrero

   
   
   
--
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047
   
  
 



 --
 Luis Carlos Guerrero Covo
 M.S. Computer Engineering
 (57) 3183542047



Searching w/explicit Multi-Word Synonym Expansion

2013-07-16 Thread dmarini
Hi Everyone,

I'm using Solr (version 4.3) for the first time and through much research I
got into writing a custom search handler using edismax to do relevancy
searches. Of course, the client I'm preparing the search for also has
synonyms (both bidirectional and explicit). After much research, I have
managed to get the bidirectional synonyms to work, but we have one scenario
that isn't behaving as expected. To simplify the example, imagine that my
collection has 2 fields:

Sku: String
Title String

Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch
which have a type that corresponds to the following field type in the schema
file:



As you can see, the bidirectional synonyms (ones that look like the
following:  ipod, i-pod, iPod) are expanded and stored in the index (the
synonyms.txt file) as per the best practices from the wiki. One unique thing
I've seen is that we have a bunch of shortcut terms where a user wants to
type in lp and it will bring up one of 5 skus. So I created a
shortcuts.txt file that has only the explicit synonym mappings (like so:  lp
= 12345, 98765, 11010). My thought to including only these in the query
analyzer portion is that since explicit synonyms are not expanded (since the
sku values are already indexed in the field as they should be) and the
expand=true is useless for explicit synonyms (based on my reading), I can
just use the explicit synonym expand the query term to it's mapped skus and
just find documents containing them, but it's not working like it does in my
head :)

I'll paste my handler below, here's the issue. for use cases like the one
above it's working. It's when I have an entry in shortcuts.txt that looks
like this: (hot dog = 12345, 67890, 10232) that I don't get anything back
if I put in hot dog but I do get results when I use hot dog with quotes.

Is there any way to get the results without quotes? am I doing something
wrong altogether? are there any other suggestions?  my search handler looks
as follows:



Thanks for any help that can be offered.

--Dave



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to optimize a search?

2013-07-16 Thread padcoe
I've sold it removing filter class=solr.DoubleMetaphoneFilterFactory
inject=true/. 

But now, i have a problem. If i search for Rocket Bananaa ( with double
'a' ) the result don't appear in first.

Any ideas how to fix it?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-a-search-tp4077531p4078468.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to optimize a search?

2013-07-16 Thread padcoe
Rocket Banana (Single) should be first because its the closest to Rocket
Banana.

How can i get a ideal rank to return closests words in firsts position?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-optimize-a-search-tp4077531p4078470.html
Sent from the Solr - User mailing list archive at Nabble.com.


Where to specify numShards when startup up a cloud setup

2013-07-16 Thread Robert Stewart
I want to script the creation of N solr cloud instances (on ec2).

But its not clear to me where I would specify numShards setting.
From documentation, I see you can specify on the first node you start up, OR 
alternatively, use the collections API to create a new collection - but in 
that case you need first at least one running SOLR instance.  I want to push 
all solr instances with similar configuration onto N instances and just run 
them with some number of shards pre-set somehow.  Where can I put numShards 
configuration setting?

What I want to do:

1) push solr configuration to zookeeper ensemble using zkCli command-line tool.
2) create N instances of SOLR running on Ec2, pointing to the same zookeeper
3) start all SOLR instances which will become a cloud setup with M shards 
(where MN), and N-M replicas.

Currently everything starts up with 1 shards, and N replicas.

I already have one single collection pre-configured.


Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
On Tue, Jul 16, 2013 at 5:08 PM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi guys,

 First of all, thanks for your response.

 Jack: Data structure was created some time ago and this is a new
 requirement in my project. I'm trying to find a solution. I wouldn't like
 to split multivalued field into N similar records varying in this
 particular field only. That could impact performance and imply more changes
 in backend architecture as well. I'd prefer to create yet another
 collection and use pseudo-joins...

 Roman: Your ideas seem to be much closer to what I'm looking for. However,
 the following syntax: text (1|2|3) does not work for me. Are you sure it
 works like OR inside a regexp ?


I wasn't clear, sorry: the text (1|1|3) is a result of the term expansion
- you can see something like that when you look at debugQuery=true output
after you sent phrase quer* - lucene will search for the variants by
enumerating the possible alternatives, hence phrase (token|token|token)

it is possible to construct such a query manually, it depends on your
application

one more thing: the term expansion depends on the type of the field (ie.
expanding string field is different from the int field type), yet you could
very easily write a small processor that looks at the range values and
treats them as numbers (*after* they were parsed by the qparser, but
*before* they were built into a query - hmmm, now when I think of it...
your values will be indexed as strings, so you have to search/expand into
string byterefs - it's doable, just wanted to point out this detail - in
normal situations, SOLR will be building query tokens using the string/text
field, because your field will be of that type)

roman



 By the way: Honestly, I have one more requirement for which I would have to
 extend Solr query syntax. Basically, it should be possible to do some math
 on few fields and do range query on the result (without indexing it,
 because a combination of different fields is allowed). I'd like to spend
 some time on ANTLR and the new way of parsing you mentioned. I will let you
 know if it was useful for me. Thanks.

 Kind regards.


 On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote:

  Well, I think this is slightly too categorical - a range query on a
  substring can be thought of as a simple range query. So, for example the
  following query:
 
  lucene 1*
 
  becomes behind the scenes: lucene (10|11|12|13|14|1abcd)
 
  the issue there is that it is a string range, but it is a range query -
 it
  just has to be indexed in a clever way
 
  So, Marcin, you still have quite a few options besides the strict boolean
  query model
 
  1. have a special tokenizer chain which creates one token out of these
  groups (eg. some text prefix_1) and search for some text prefix_*
 [and
  do some post-filtering if necessary]
  2. another version, using regex /some text (1|2|3...)/ - you got the idea
  3. construct the lucene multi-term range query automatically, in your
  qparser - to produce a phrase query lucene (10|11|12|13|14)
  4. use payloads to index your integer at the position of some text and
  then retrieve only some text where the payload is in range x-y - an
  example is here, look at getPayloadQuery()
 
 
 https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
  but this is more complex situation and if you google, you will find a
  better description
  5. use a qparser that is able to handle nested search and analysis at the
  same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] -
 i
  know about a parser that can handle this and i invite others to check it
  out (yeah, JIRA tickets need reviewers ;-))
  https://issues.apache.org/jira/browse/LUCENE-5014
 
  there might be others i forgot, but it is certainly doable; but as Jack
  points out, you may want to stop for a moment to reflect whether it is
  necessary
 
  HTH,
 
roman
 
 
  On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com
  wrote:
 
   Sorry, but you are basically misusing Solr (and multivalued fields),
   trying to take a shortcut to avoid a proper data model.
  
   To properly use Solr, you need to put each of these multivalued field
   values in a separate Solr document, with a text field and a value
   field. Then, you can query:
  
  text:some text AND value:[min-value TO max-value]
  
   Exactly how you should restructure your data model is dependent on all
 of
   your other requirements.
  
   You may be able to simply flatten your data.
  
   You may be able to use a simple join operation.
  
   Or, maybe you need to do a multi-step query operation if you data is
   sufficiently complex.
  
   If you want to keep your multivalued field in its current form for
  display
   purposes or keyword search, or exact match search, fine, but your
 stated
   goal is inconsistent with the Semantics of Solr and Lucene.
  
   To be crystal clear, 

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-16 Thread Jack Krupansky
In case you were unaware, generalized multi-word synonym expansion is an 
unsolved problem in Lucene/Solr. Sure, some of the tools are there and you 
can sometimes make it work for some situations, but not for the general 
case. Some work has been in progress, but no near-term solution is at hand.


-- Jack Krupansky

-Original Message- 
From: dmarini

Sent: Tuesday, July 16, 2013 5:23 PM
To: solr-user@lucene.apache.org
Subject: Searching w/explicit Multi-Word Synonym Expansion

Hi Everyone,

I'm using Solr (version 4.3) for the first time and through much research I
got into writing a custom search handler using edismax to do relevancy
searches. Of course, the client I'm preparing the search for also has
synonyms (both bidirectional and explicit). After much research, I have
managed to get the bidirectional synonyms to work, but we have one scenario
that isn't behaving as expected. To simplify the example, imagine that my
collection has 2 fields:

Sku: String
Title String

Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch
which have a type that corresponds to the following field type in the schema
file:



As you can see, the bidirectional synonyms (ones that look like the
following:  ipod, i-pod, iPod) are expanded and stored in the index (the
synonyms.txt file) as per the best practices from the wiki. One unique thing
I've seen is that we have a bunch of shortcut terms where a user wants to
type in lp and it will bring up one of 5 skus. So I created a
shortcuts.txt file that has only the explicit synonym mappings (like so:  lp
= 12345, 98765, 11010). My thought to including only these in the query
analyzer portion is that since explicit synonyms are not expanded (since the
sku values are already indexed in the field as they should be) and the
expand=true is useless for explicit synonyms (based on my reading), I can
just use the explicit synonym expand the query term to it's mapped skus and
just find documents containing them, but it's not working like it does in my
head :)

I'll paste my handler below, here's the issue. for use cases like the one
above it's working. It's when I have an entry in shortcuts.txt that looks
like this: (hot dog = 12345, 67890, 10232) that I don't get anything back
if I put in hot dog but I do get results when I use hot dog with quotes.

Is there any way to get the results without quotes? am I doing something
wrong altogether? are there any other suggestions?  my search handler looks
as follows:



Thanks for any help that can be offered.

--Dave



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: How to optimize a search?

2013-07-16 Thread Walter Underwood
Use fuzzy search instead of phonetic search. Phonetic search is a poor match to 
most queries.

At Netflix, we dropped phonetic search and started using fuzzy. There was a 
clear improvement in the A/B test.

wunder

On Jul 16, 2013, at 2:25 PM, padcoe wrote:

 Rocket Banana (Single) should be first because its the closest to Rocket
 Banana.
 
 How can i get a ideal rank to return closests words in firsts position?
 
 






Re: Where to specify numShards when startup up a cloud setup

2013-07-16 Thread Ali, Saqib
What does the solr.xml look like on the nodes?


On Tue, Jul 16, 2013 at 2:36 PM, Robert Stewart robert_stew...@epam.comwrote:

 I want to script the creation of N solr cloud instances (on ec2).

 But its not clear to me where I would specify numShards setting.
 From documentation, I see you can specify on the first node you start
 up, OR alternatively, use the collections API to create a new collection
 - but in that case you need first at least one running SOLR instance.  I
 want to push all solr instances with similar configuration onto N instances
 and just run them with some number of shards pre-set somehow.  Where can I
 put numShards configuration setting?

 What I want to do:

 1) push solr configuration to zookeeper ensemble using zkCli command-line
 tool.
 2) create N instances of SOLR running on Ec2, pointing to the same
 zookeeper
 3) start all SOLR instances which will become a cloud setup with M shards
 (where MN), and N-M replicas.

 Currently everything starts up with 1 shards, and N replicas.

 I already have one single collection pre-configured.



Re: Range query on a substring.

2013-07-16 Thread Ahmet Arslan
Hi Macrin,

May be you can use https://issues.apache.org/jira/browse/SOLR-1604 . 
ComplexPhraseQueryParser supports ranges inside phrases.



 From: Marcin Rzewucki mrzewu...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, July 17, 2013 12:08 AM
Subject: Re: Range query on a substring.
 

Hi guys,

First of all, thanks for your response.

Jack: Data structure was created some time ago and this is a new
requirement in my project. I'm trying to find a solution. I wouldn't like
to split multivalued field into N similar records varying in this
particular field only. That could impact performance and imply more changes
in backend architecture as well. I'd prefer to create yet another
collection and use pseudo-joins...

Roman: Your ideas seem to be much closer to what I'm looking for. However,
the following syntax: text (1|2|3) does not work for me. Are you sure it
works like OR inside a regexp ?
By the way: Honestly, I have one more requirement for which I would have to
extend Solr query syntax. Basically, it should be possible to do some math
on few fields and do range query on the result (without indexing it,
because a combination of different fields is allowed). I'd like to spend
some time on ANTLR and the new way of parsing you mentioned. I will let you
know if it was useful for me. Thanks.

Kind regards.


On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote:

 Well, I think this is slightly too categorical - a range query on a
 substring can be thought of as a simple range query. So, for example the
 following query:

 lucene 1*

 becomes behind the scenes: lucene (10|11|12|13|14|1abcd)

 the issue there is that it is a string range, but it is a range query - it
 just has to be indexed in a clever way

 So, Marcin, you still have quite a few options besides the strict boolean
 query model

 1. have a special tokenizer chain which creates one token out of these
 groups (eg. some text prefix_1) and search for some text prefix_* [and
 do some post-filtering if necessary]
 2. another version, using regex /some text (1|2|3...)/ - you got the idea
 3. construct the lucene multi-term range query automatically, in your
 qparser - to produce a phrase query lucene (10|11|12|13|14)
 4. use payloads to index your integer at the position of some text and
 then retrieve only some text where the payload is in range x-y - an
 example is here, look at getPayloadQuery()

 https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
 but this is more complex situation and if you google, you will find a
 better description
 5. use a qparser that is able to handle nested search and analysis at the
 same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i
 know about a parser that can handle this and i invite others to check it
 out (yeah, JIRA tickets need reviewers ;-))
 https://issues.apache.org/jira/browse/LUCENE-5014

 there might be others i forgot, but it is certainly doable; but as Jack
 points out, you may want to stop for a moment to reflect whether it is
 necessary

 HTH,

   roman


 On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  Sorry, but you are basically misusing Solr (and multivalued fields),
  trying to take a shortcut to avoid a proper data model.
 
  To properly use Solr, you need to put each of these multivalued field
  values in a separate Solr document, with a text field and a value
  field. Then, you can query:
 
     text:some text AND value:[min-value TO max-value]
 
  Exactly how you should restructure your data model is dependent on all of
  your other requirements.
 
  You may be able to simply flatten your data.
 
  You may be able to use a simple join operation.
 
  Or, maybe you need to do a multi-step query operation if you data is
  sufficiently complex.
 
  If you want to keep your multivalued field in its current form for
 display
  purposes or keyword search, or exact match search, fine, but your stated
  goal is inconsistent with the Semantics of Solr and Lucene.
 
  To be crystal clear, there is no such thing as a range query on a
  substring in Solr or Lucene.
 
  -- Jack Krupansky
 
  -Original Message- From: Marcin Rzewucki
  Sent: Tuesday, July 16, 2013 5:13 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Range query on a substring.
 
 
  By multivalued I meant an array of values. For example:
  arr name=myfield
   strtext1 (X)/str
   strtext2 (Y)/str
  /arr
 
  I'd like to avoid spliting it as you propose. I have 2.3mn collection
 with
  pretty large records (few hundreds fields and more per record).
 Duplicating
  them would impact performance.
 
  Regards.
 
 
 
  On 16 July 2013 10:26, Oleg Burlaca oburl...@gmail.com wrote:
 
   Ah, you mean something like this:
  record:
  Id=10, text =  this is a text N1 (X), another text N2 (Y), text N3 (Z)
  Id=11, text =  this is a text N1 (W), another text N2 (Q), 

Re: Where to specify numShards when startup up a cloud setup

2013-07-16 Thread Shawn Heisey

On 7/16/2013 3:36 PM, Robert Stewart wrote:

I want to script the creation of N solr cloud instances (on ec2).

But its not clear to me where I would specify numShards setting.
 From documentation, I see you can specify on the first node you start up, OR 
alternatively, use the collections API to create a new collection - but in that case 
you need first at least one running SOLR instance.  I want to push all solr instances with similar 
configuration onto N instances and just run them with some number of shards pre-set somehow.  Where 
can I put numShards configuration setting?

What I want to do:

1) push solr configuration to zookeeper ensemble using zkCli command-line tool.
2) create N instances of SOLR running on Ec2, pointing to the same zookeeper
3) start all SOLR instances which will become a cloud setup with M shards (where 
MN), and N-M replicas.


A minimal redundant SolrCloud cluster consists of two larger machines 
that run Solr and zookeeper, plus a third smaller machine that runs just 
zookeeper.  This is just the minimum requirement, you can use additional 
and more powerful servers.


The general way that you should set up a brand new SolrCloud.  If anyone 
spots a problem with this, please don't hesitate to mention it:


1) Set up three hosts running standalone zookeeper, configured as a 
fully redundant ensemble.  This is outside the scope of Solr 
documentation, please consult the zookeeper site:


http://zookeeper.apache.org

2) Construct a zkHost parameter for your ZK ensemble.  An example is 
below using the default zookeeper port of 2181.  You'd need to use the 
proper port numbers, names, etc.  The /chroot part is optional, but 
highly recommended.  Use a name that has meaning for your SolrCloud 
cluster rather than chroot:


-DzkHost=server1:2181,server2:2181,server3:2181/chroot

By using the /chroot syntax, you can run more than one SolrCloud cluster 
on your zookeeper ensemble.  Just use a different value for each cluster.


3) Start Solr with the same zkHost parameter on every Solr host, 
referring to the three zookeeper hosts already set up.  You can use the 
same hosts for Solr as you did for zookeeper.


4) Use the zkcli script in example/cloud-scripts to upload a 
configuration set to zookeeper using the upconfig command.  If you 
aren't using the Solr example or a custom install based on the example, 
then you'll need to examine the script to figure out how to run the java 
command manually and have it find the solr and zookeeper jars.


5) Use the Collections API to create a collection, referencing the 
uploaded config set and including additional parameters like numShards. 
 If you have four Solr hosts, the following API call would work perfectly:


http://server:port/solr/admin/collections?action=CREATEname=mycollectionnumShards=2replicationFactor=2collection.configName=mycfg

Thanks,
Shawn



Re: Range query on a substring.

2013-07-16 Thread Jack Krupansky

Yeah, I was thinking about that.

But... will it properly order 10 as being greater than 9?  Usually, we 
used trie or sorted field types to assure numeric order, but a text field 
doesn't have that feature.


Although I did think that maybe you could have a token filter that mapped 
numeric values to a fixed number of digits with leading zeros, and then they 
would be properly ordered. But, I don't think we have a token filter that 
can do that, although I imagine that a new one could be proposed.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Tuesday, July 16, 2013 6:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Range query on a substring.

Hi Macrin,

May be you can use https://issues.apache.org/jira/browse/SOLR-1604 . 
ComplexPhraseQueryParser supports ranges inside phrases.




From: Marcin Rzewucki mrzewu...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, July 17, 2013 12:08 AM
Subject: Re: Range query on a substring.


Hi guys,

First of all, thanks for your response.

Jack: Data structure was created some time ago and this is a new
requirement in my project. I'm trying to find a solution. I wouldn't like
to split multivalued field into N similar records varying in this
particular field only. That could impact performance and imply more changes
in backend architecture as well. I'd prefer to create yet another
collection and use pseudo-joins...

Roman: Your ideas seem to be much closer to what I'm looking for. However,
the following syntax: text (1|2|3) does not work for me. Are you sure it
works like OR inside a regexp ?
By the way: Honestly, I have one more requirement for which I would have to
extend Solr query syntax. Basically, it should be possible to do some math
on few fields and do range query on the result (without indexing it,
because a combination of different fields is allowed). I'd like to spend
some time on ANTLR and the new way of parsing you mentioned. I will let you
know if it was useful for me. Thanks.

Kind regards.


On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote:


Well, I think this is slightly too categorical - a range query on a
substring can be thought of as a simple range query. So, for example the
following query:

lucene 1*

becomes behind the scenes: lucene (10|11|12|13|14|1abcd)

the issue there is that it is a string range, but it is a range query - it
just has to be indexed in a clever way

So, Marcin, you still have quite a few options besides the strict boolean
query model

1. have a special tokenizer chain which creates one token out of these
groups (eg. some text prefix_1) and search for some text prefix_* [and
do some post-filtering if necessary]
2. another version, using regex /some text (1|2|3...)/ - you got the idea
3. construct the lucene multi-term range query automatically, in your
qparser - to produce a phrase query lucene (10|11|12|13|14)
4. use payloads to index your integer at the position of some text and
then retrieve only some text where the payload is in range x-y - an
example is here, look at getPayloadQuery()

https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/adsabs/lucene/BenchmarkAuthorSearch.java-
but this is more complex situation and if you google, you will find a
better description
5. use a qparser that is able to handle nested search and analysis at the
same time - eg. your query is: field:some text NEAR1 field:[0 TO 10] - i
know about a parser that can handle this and i invite others to check it
out (yeah, JIRA tickets need reviewers ;-))
https://issues.apache.org/jira/browse/LUCENE-5014

there might be others i forgot, but it is certainly doable; but as Jack
points out, you may want to stop for a moment to reflect whether it is
necessary

HTH,

  roman


On Tue, Jul 16, 2013 at 8:35 AM, Jack Krupansky j...@basetechnology.com
wrote:

 Sorry, but you are basically misusing Solr (and multivalued fields),
 trying to take a shortcut to avoid a proper data model.

 To properly use Solr, you need to put each of these multivalued field
 values in a separate Solr document, with a text field and a value
 field. Then, you can query:

text:some text AND value:[min-value TO max-value]

 Exactly how you should restructure your data model is dependent on all 
 of

 your other requirements.

 You may be able to simply flatten your data.

 You may be able to use a simple join operation.

 Or, maybe you need to do a multi-step query operation if you data is
 sufficiently complex.

 If you want to keep your multivalued field in its current form for
display
 purposes or keyword search, or exact match search, fine, but your stated
 goal is inconsistent with the Semantics of Solr and Lucene.

 To be crystal clear, there is no such thing as a range query on a
 substring in Solr or Lucene.

 -- Jack Krupansky

 -Original Message- From: Marcin Rzewucki
 Sent: Tuesday, July 16, 2013 5:13 AM
 To: solr-user@lucene.apache.org
 

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-16 Thread Ahmet Arslan
Hi Dmarin,

Did you consider using http://wiki.apache.org/solr/QueryElevationComponent ?





 From: Jack Krupansky j...@basetechnology.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, July 17, 2013 12:53 AM
Subject: Re: Searching w/explicit Multi-Word Synonym Expansion
 

In case you were unaware, generalized multi-word synonym expansion is an 
unsolved problem in Lucene/Solr. Sure, some of the tools are there and you 
can sometimes make it work for some situations, but not for the general 
case. Some work has been in progress, but no near-term solution is at hand.

-- Jack Krupansky

-Original Message- 
From: dmarini
Sent: Tuesday, July 16, 2013 5:23 PM
To: solr-user@lucene.apache.org
Subject: Searching w/explicit Multi-Word Synonym Expansion

Hi Everyone,

I'm using Solr (version 4.3) for the first time and through much research I
got into writing a custom search handler using edismax to do relevancy
searches. Of course, the client I'm preparing the search for also has
synonyms (both bidirectional and explicit). After much research, I have
managed to get the bidirectional synonyms to work, but we have one scenario
that isn't behaving as expected. To simplify the example, imagine that my
collection has 2 fields:

Sku: String
Title String

Using CopyFields, I copy these to 2 more fields, SkuSearch and TitleSearch
which have a type that corresponds to the following field type in the schema
file:



As you can see, the bidirectional synonyms (ones that look like the
following:  ipod, i-pod, iPod) are expanded and stored in the index (the
synonyms.txt file) as per the best practices from the wiki. One unique thing
I've seen is that we have a bunch of shortcut terms where a user wants to
type in lp and it will bring up one of 5 skus. So I created a
shortcuts.txt file that has only the explicit synonym mappings (like so:  lp
= 12345, 98765, 11010). My thought to including only these in the query
analyzer portion is that since explicit synonyms are not expanded (since the
sku values are already indexed in the field as they should be) and the
expand=true is useless for explicit synonyms (based on my reading), I can
just use the explicit synonym expand the query term to it's mapped skus and
just find documents containing them, but it's not working like it does in my
head :)

I'll paste my handler below, here's the issue. for use cases like the one
above it's working. It's when I have an entry in shortcuts.txt that looks
like this: (hot dog = 12345, 67890, 10232) that I don't get anything back
if I put in hot dog but I do get results when I use hot dog with quotes.

Is there any way to get the results without quotes? am I doing something
wrong altogether? are there any other suggestions?  my search handler looks
as follows:



Thanks for any help that can be offered.

--Dave



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469.html
Sent from the Solr - User mailing list archive at Nabble.com. 

Re: Solr is not responding on deployment in tomcat

2013-07-16 Thread Erick Erickson
Yes, you need to use a different port for Solr.
As for the contextpath, I have no idea.

Best
Erick

On Tue, Jul 16, 2013 at 2:02 PM, Per Newgro per.new...@gmx.ch wrote:
 Thanks Eric,

 i've configured both to use 8080 (For wicket this is standard :-)).

 Do i have to assign a different port to solr if i use both webapps in the
 same container?
 Btw. the contextpath for my wicket app is /*
 Could that be a problem to?

 Per

 Am 15.07.2013 17:12, schrieb Erick Erickson:

 Sounds like Wicket and Solr are using the same port(s)...

 If you start Wicket first then look at the Solr logs, you might
 see some message about port already in use or some such.

 If this is SolrCloud, there are also the ZooKeeper ports to
 wonder about.

 Best
 Erick

 On Mon, Jul 15, 2013 at 6:49 AM, Per Newgro per.new...@gmx.ch wrote:

 Hi,

 maybe someone here can help me with my solr-4.3.1 issue.

 I've successful deployed the solr.war on a tomcat7 instance.
 Starting the tomcat with only the solr.war deployed - works nicely.
 I can see the admin interface and logs are clean.

 If i
 deploy my wicket-spring-data-solr based app (using the HttpSolrServer)
 after the solr app
 without restarting the tomcat
 = all is fine to.

 I've implemented a ping to see if server is up.

 code
  private void waitUntilSolrIsAvailable(int i) {
  if (i == 0) {
  logger.info(Check solr state...);
  }
  if (i  5) {
  throw new RuntimeException(Solr is not
 avaliable after more than 25 secs. Going down now.);
  }
  if (i  0) {
  try {
  logger.info(Wait for solr to get
 alive.);
  Thread.currentThread().wait(5000);
  } catch (InterruptedException e) {
  throw new RuntimeException(e);
  }
  }
  try {
  i++;
  SolrPingResponse r = solrServer.ping();
  if (r.getStatus()  0) {
  waitUntilSolrIsAvailable(i);
  }
  logger.info(Solr is alive.);
  } catch (SolrServerException | IOException e) {
  throw new RuntimeException(e);
  }
  }
 /code

 Here i can see log
 log
 54295 [localhost-startStop-2] INFO  org.apache.wicket.Application  –
 [wicket.project] init: Wicket extensions initializer
 INFO  - 2013-07-15 12:07:45.261;
 de.company.service.SolrServerInitializationService; Check solr state...
 54505 [localhost-startStop-2] INFO
 de.company.service.SolrServerInitializationService  – Check solr state...
 INFO  - 2013-07-15 12:07:45.768; org.apache.solr.core.SolrCore;
 [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2}
 hits=0 status=0 QTime=20
 55012 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
 [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2}
 hits=0 status=0 QTime=20
 INFO  - 2013-07-15 12:07:45.770; org.apache.solr.core.SolrCore;
 [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2}
 status=0 QTime=22
 55014 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
 [collection1] webapp=/solr path=/admin/ping params={wt=javabinversion=2}
 status=0 QTime=22
 INFO  - 2013-07-15 12:07:45.854;
 de.company.service.SolrServerInitializationService; Solr is alive.
 55098 [localhost-startStop-2] INFO
 de.company.service.SolrServerInitializationService  – Solr is alive.
 /log

 But if i
 restart the tomcat
 with both webapps (solr and wicket)
 the solr is not responding on the ping request.

 log
 INFO  - 2013-07-15 12:02:27.634; org.apache.wicket.Application;
 [wicket.project] init: Wicket extensions initializer
 11932 [localhost-startStop-1] INFO  org.apache.wicket.Application  –
 [wicket.project] init: Wicket extensions initializer
 INFO  - 2013-07-15 12:02:27.787;
 de.company.service.SolrServerInitializationService; Check solr state...
 12085 [localhost-startStop-1] INFO
 de.company.service.SolrServerInitializationService  – Check solr state...
 /log

 What could that be or how can i get infos where this is stopping?

 Thanks for your support
 Per




Re: How to use joins in solr 4.3.1

2013-07-16 Thread Erick Erickson
You can only join on indexed fields, our Location:merchantId field is not
indexed.

Best
Erick

On Tue, Jul 16, 2013 at 2:48 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote:
 Found this post:
 http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201302.mbox/%3CCAB_8Yd82aqq=oY6dBRmVjG7gvBBewmkZGF9V=fpne4xgkbu...@mail.gmail.com%3E

 And based on the answer, I modified my query: localhost:8983/solr/location/
 select?fq={!join from=key to=merchantId fromIndex=merchant}*:*

 I don't see any errors, but my original problem still persists, no
 documents are returned.
 The two fields on which I am trying to join is:

 Merchant: field name=merchantId type=string   indexed=true
 stored=true  multiValued=false /
 Location:  field name=merchantId type=string   indexed=false
 stored=true  multiValued=false /

 Thanks,
 -Utkarsh


 On Tue, Jul 16, 2013 at 11:39 AM, Utkarsh Sengar utkarsh2...@gmail.comwrote:

 Looks like the JoinQParserPlugin is throwing an NPE.
 Query: localhost:8983/solr/location/select?q=*:*fq={!join from=key
 to=merchantId fromIndex=merchant}

 84343345 [qtp2012387303-16] ERROR org.apache.solr.core.SolrCore  –
 java.lang.NullPointerException
 at
 org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
 at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
 at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1274)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:457)
 at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:410)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:365)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
 at
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:662)

 84343350 [qtp2012387303-16] INFO  org.apache.solr.core.SolrCore  –
 [location] webapp=/solr path=/select
 params={distrib=falsewt=javabinversion=2rows=10df=allTextfl=key,scoreshard.url=x:8983/solr/location/NOW=1373999694930start=0q=*:*_=1373999505886isShard=truefq={!join+from%3Dkey+to%3DmerchantId+fromIndex%3Dmerchant}fsv=true}
 status=500 QTime=6
 84343351 [qtp2012387303-16] ERROR
 org.apache.solr.servlet.SolrDispatchFilter  –
 null:java.lang.NullPointerException
 at
 org.apache.solr.search.JoinQuery.hashCode(JoinQParserPlugin.java:580)
 at org.apache.solr.search.QueryResultKey.init(QueryResultKey.java:50)
 at
 

Re: SolrCloud: Collection API question and problem with core loading

2013-07-16 Thread Erick Erickson
All of the core loading stuff is on the server side, so CloudSolrServer
isn't really germane (I don't think anyway).

This is in a bit of flux, so try having one core that's loaded on startup
even if it's just a dummy core. There's currently ongoing work to
play nicer with no cores being defined at startup, but that's not in
4.3.

Take a look at: http://wiki.apache.org/solr/CoreAdmin#CREATE
where it talks about optional parameters.

NOTE: 4.4 (release imminent) has substantial fixes for the whole
persistence situation. Also note that solr.xml is going away as a
place to store core information and core discovery will be supported
only from 5.x on.

Good Luck!
Erick

On Mon, Jul 15, 2013 at 9:05 PM, Patrick Mi
patrick...@touchpointgroup.com wrote:
 Hi there,

 I run 2 solr instances ( Tomcat 7, Solr 4.3.0 , one shard),one external
 Zookeeper instance and have lots of cores.

 I use collection API to create the new core dynamically after the
 configuration for the core is uploaded to the Zookeeper and it all works
 fine.

 As there are so many cores it takes very long time to load them at start up
 I would like to start up the server quickly and load the cores on demand.

 When the core is created via collection API it is created with default
 parameter : loadOnStartup=true ( this can be seen in solr.xml )

 Question: is there a way to specify this parameter so it can be set 'false'
 in collection API ?

 Problem: If I manually set loadOnStartup=true for the core I had exception
 below when I used CloudSolrServer to query the core :
 Error: org.apache.solr.client.solrj.SolrServerException: No live SolrServers
 available to handle this request

 Seems to me that CloudSolrServer will not trigger the core to be loaded.

 Is it possible to get the core loaded using CloudSolrServer?

 Regards,
 Patrick




Re: About Suggestions

2013-07-16 Thread Erick Erickson
Maybe it was lost, I tent to babble on... But use a copyField directive
that doesn't have the EdgeNGramTokenizerFactory in the chain
and get your suggestions from _that_ field rather than the one you
do use currently. You can still search  etc. on the one you now
have, just get your suggestions from the copied field.

Best
Erick

On Tue, Jul 16, 2013 at 8:39 AM, Lochschmied, Alexander
alexander.lochschm...@vishay.com wrote:
 Thanks Eric, that is what I suspected. We are very happy with the four 
 suggestions in the example (and all the others), but we would like to know 
 which of them represents a full part number.
 Can you elaborate a little more how that could be achieved?

 Best regards,
 Alexander

 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Dienstag, 16. Juli 2013 14:09
 An: solr-user@lucene.apache.org
 Betreff: Re: About Suggestions

 Garbage in, garbage out G

 Your indexing analysis chain is breaking up the tokens via the 
 EdgeNgramTokenizer and _putting those values in the index_.
 Then the TermsComponent is looking _only_ at the tokens in the index and 
 giving you back exactly what you're asking for.

 So no, there's no way with that analysis chain to get only complete terms, at 
 that level the fact that a term was part of a larger input token has been 
 lost. In fact, if you were to enter something like terms.prefix=1n1 you'd 
 likely see all your 3-grams that start with 1n1 etc.

 So use a copyfield and put these in a separate field that has only whole 
 tokens or just take the EdgeNgramTokenizer from your current definition. If 
 the latter, blow away your index and re-index from scratch.

 Best
 Erick

 On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Hi Eric and everybody else!

 Thanks for trying to help. Here is the example:

 .../terms?terms.regex.flag=case_insensitiveterms.fl=suggestterms=tru
 eterms.limit=20terms.sort=indexterms.prefix=1n1187

 returns

 int name=1n11871/int
 int name=1n1187a1/int
 int name=1n1187r1/int
 int name=1n1187ra1/int

 This list contains 3 complete part numbers but the third item (1n1187r) is 
 not a complete part number. Is there a way to make terms tell if a term 
 represents a complete value?
 (My guess is that this gets lost after ngram but I'm still hoping
 something can be done.)

 More config details:

 field name=suggest type=text_parts indexed=true stored=true
 required=false multiValued=true/

 and

 fieldType name=text_parts class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 
 maxGramSize=20 side=front/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 Thanks,
 Alexander


 -Ursprüngliche Nachricht-
 Von: Erick Erickson [mailto:erickerick...@gmail.com]
 Gesendet: Samstag, 13. Juli 2013 19:58
 An: solr-user@lucene.apache.org
 Betreff: Re: About Suggestions

 Not quite sure what you mean here, a couple of examples would help.

 But since the term is using keyword tokenizer, then each thing you get back 
 is a complete term, by definition. So I'm not quite sure what you're asking 
 here.

 Best
 Erick

 On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
 alexander.lochschm...@vishay.com wrote:
 Hi Solr people!

 We need to suggest part numbers in alphabetically order adding up to four 
 characters to the already entered part number prefix. That works quite well 
 with terms component acting on a multivalued field with keyword tokenizer 
 and edge nGram filter. I am mentioning part numbers to indicate that each 
 item in the multivalued field is a string without whitespace and where 
 special characters like dashes cannot be seen as separators.

 Is there a way to know if the term (the suggestion) represents such a 
 complete part number (without doing another query for each suggestion)?

 Since we are using SolJ, what we would need is something like
 boolean Term.isRepresentingCompleteFieldValue()

 Thanks,
 Alexander


Re: Need advice on performing 300 queries per second on solr index

2013-07-16 Thread Ralf Heyde

Hello,

1. It depends on your query types  data (complexity, featureset, 
paging) - geospatial could be something with calculation inside solr?
2. It depends massively on the document size  field-selection (load a 
hundred of 100MB documents can take some time)
3. It depends especially on your disc IO / Ram utilization - are these 
dedicated machines ?
4. It depends on how often you changing your documents (cache 
warm-ups!!!, disc IO)!
5. What is the bottleneck? CPU ? RAM ? Disc? You should be able to give 
some more information about this.
6. It depends on the amount of cores (more cores must not be better - 
CPU-caching, OS-management overhead...)
7. Force cache hit-rate - means: control the type of queries, cluster 
them and send them to A or B - to have a higher chance for a cache hit.


Maybe you can give some more details about the points I mentioned.

Ralf

On 07/16/2013 04:42 PM, adfel70 wrote:

Hi
I need to create a solr cluster that contains geospatial information and
provides the ability to perform a few hundreds queries per second, each
query should retrieve around 100k results.
The data is around 100k documents, around 300gb total.

I started with 2 shard cluster (replicationFactor 1) and a portion of the
data - 20 gb.

I run some load-tests and see that when 100 requests are sent in one second,
the average qTime is around 4 seconds, but the average total response time
(measuring from sending the request to solr untill getting a response )
reaches 20-25 seconds which is very bad.

Currently I load-balance myself between the 2 solr servers (each request is
sent to another server)

Any advice on  which resources do I need and how my solr cluster should look
like?
More shards? more replicas? another webserver?

Thanks.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-advice-on-performing-300-queries-per-second-on-solr-index-tp4078353.html
Sent from the Solr - User mailing list archive at Nabble.com.