Magento solr Invalid Date String:'false'

2013-08-26 Thread Nikesh12
We are getting below message during solr indexing running by cron setting in
magento.


Aug 12, 2013 8:06:15 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[24P1602]} 0 1
Aug 12, 2013 8:06:16 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=24P1602] Error
adding field 'lepubdate_datetime'='false'
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.JsonLoader.processUpdate(JsonLoader.java:100)
at org.apache.solr.handler.JsonLoader.load(JsonLoader.java:75)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: Invalid Date String:'false'
at org.apache.solr.schema.DateField.parseMath(DateField.java:161)
at org.apache.solr.schema.TrieField.createField(TrieField.java:419)
at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
... 22 more

Aug 12, 2013 8:06:16 AM org.apache.solr.core.SolrCore execute 
=

Best to post to the solr-user list rather than general, but looks like
you've got a type mismatch:

'lepubdate_datetime'='false'

What type is lepubdate_datetime?   I'm guessing it's a "date" type and
shouldn't be getting the value 'false' :)

Erik 

===

Hi Eric,

Can you please let me know where should i look to correct the issue. In
database i have found that there is "lepubdate" field in "eav_attribute"
table with "backend_type as datetime". But there are not any field such as 
'lepubdate_datetime' in database. but solr giving 
'lepubdate_datetime'='false' error in his log.


Thanks
Nikesh 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Magento-solr-Invalid-Date-String-false-tp4086747.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht
Dan,

if you're bound to federated search then I would say that you need to work on 
the service guarantees of each of the nodes and, maybe, create strategies to 
cope with bad nodes.

paul


Le 26 août 2013 à 22:57, Dan Davis a écrit :

> First answer:
> 
> My employer is a library and do not have the license to harvest everything
> indexed by a "web-scale discovery service" such as PRIMO or Summon.If
> our design automatically relays searches entered by users, and then
> periodically purges results, I think it is reasonable from a licensing
> perspective.
> 
> Second answer:
> 
> What if you wanted your Apache Solr powered search to include all results
> from Google scholar to any query?   Do you think you could easily or
> cheaply configure a Zookeeper cluster large enough to harvest and index all
> of Google Scholar?   Would that violate robot rules?Is it even possible
> to do this from an API perspective?   Wouldn't google notice?
> 
> Third answer:
> 
> On Gartner's 2013 Enterprise Search Magic Quadrant, LucidWorks and the
> other Enterprise Search firm based on Apache Solr were dinged on the lack
> of Federated Search.  I do not have the hubris to think I can fix that, and
> it is not really my role to try, but something that works without
> Harvesting and local indexing is obviously desirable to Enterprise Search
> users.
> 
> 
> 
> On Mon, Aug 26, 2013 at 4:46 PM, Paul Libbrecht  wrote:
> 
>> 
>> Why not simply create a meta search engine that indexes everything of each
>> of the nodes.?
>> (I think one calls this harvesting)
>> 
>> I believe that this the way to avoid all sorts of performance bottleneck.
>> As far as I could analyze, the performance of a federated search is the
>> performance of the least speedy node; which can turn to be quite bad if you
>> do not exercise guarantees of remote sources.
>> 
>> Or are the "remote cores" below actually things that you manage on your
>> side? If yes guarantees are easy to manage..
>> 
>> Paul
>> 
>> 
>> Le 26 août 2013 à 22:38, Dan Davis a écrit :
>> 
>>> I have now come to the task of estimating man-days to add "Blended Search
>>> Results" to Apache Solr.   The argument has been made that this is not
>>> desirable (see Jonathan Rochkind's blog entries on Bento search with
>>> blacklight).   But the estimate remains.No estimate is worth much
>>> without a design.   So, I am come to the difficult of estimating this
>>> without having an in-depth knowledge of the Apache core.   Here is my
>>> design, likely imperfect, as it stands.
>>> 
>>>  - Configure a core specific to each search source (local or remote)
>>>  - On cores that index remote content, implement a periodic delete query
>>>  that deletes documents whose timestamp is too old
>>>  - Implement a custom requestHandler for the "remote" cores that goes
>> out
>>>  and queries the remote source.   For each result in the top N
>>>  (configurable), it computes an id that is stable (e.g. it is based on
>> the
>>>  remote resource URL, doi, or hash of data returned).   It uses that id
>> to
>>>  look-up the document in the lucene database.   If the data is not
>> there, it
>>>  updates the lucene core and sets a flag that commit is required.
>> Once it
>>>  is done, it commits if needed.
>>>  - Configure a core that uses a custom SearchComponent to call the
>>>  requestHandler that goes and gets new documents and commits them.
>> Since
>>>  the cores for remote content are different cores, they can restart
>> their
>>>  searcher at this point if any commit is needed.   The custom
>>>  SearchComponent will wait for commit and reload to be completed.
>> Then,
>>>  search continues uses the other cores as "shards".
>>>  - Auto-warming on this will assure that the most recently requested
>> data
>>>  is present.
>>> 
>>> It will, of course, be very slow a good part of the time.
>>> 
>>> Erik and others, I need to know whether this design has legs and what
>> other
>>> alternatives I might consider.
>>> 
>>> 
>>> 
>>> On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson >> wrote:
>>> 
 The lack of global TF/IDF has been answered in the past,
 in the sharded case, by "usually you have similar enough
 stats that it doesn't matter". This pre-supposes a fairly
 evenly distributed set of documents.
 
 But if you're talking about federated search across different
 types of documents, then what would you "rescore" with?
 How would you even consider scoring docs that are somewhat/
 totally different? Think magazine articles an meta-data associated
 with pictures.
 
 What I've usually found is that one can use grouping to show
 the top N of a variety of results. Or show tabs with different
 types. Or have the app intelligently combine the different types
 of documents in a way that "makes sense". But I don't know
 how you'd just get "the right thing" to happen with some kind
 of scoring magic.
 
 Best
 Erick
 
 
 O

Re: Default query operator "OR" wont work in some cases

2013-08-26 Thread Jack Krupansky
Yeah, sorry, I read the parsed query too quickly - the phrase is the 
optional relevancy boost due to the pf2 parameter.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 10:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Default query operator "OR" wont work in some cases

I am not searching for phrase query, I am not sure why it shows up in
parsedquery.

 0
 3
 
   true
   true
   egg salad

   1377569284170
   xml
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Filter cache pollution during sharded edismax queries

2013-08-26 Thread Ken Krugler
Hi Otis,

Sorry I missed your reply, and thanks for trying to find a similar report.

Wondering if I should file a Jira issue? That might get more attention :)

-- Ken

On Jul 5, 2013, at 1:05pm, Otis Gospodnetic wrote:

> Hi Ken,
> 
> Uh, I left this email until now hoping I could find you a reference to
> similar reports, but I can't find them now.  I am quite sure I saw
> somebody with a similar report within the last month.  Plus, several
> people have reported issues with performance dropping when they went
> from 3.x to 4.x and maybe this is why.
> 
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
> On Tue, Jul 2, 2013 at 3:01 PM, Ken Krugler  
> wrote:
>> Hi all,
>> 
>> After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit ratio 
>> had dropped significantly.
>> 
>> Previously it was at 95+%, but now it's < 50%.
>> 
>> I enabled recording 100 entries for debugging, and in looking at them it 
>> seems that edismax (and faceting) is creating entries for me.
>> 
>> This is in a sharded setup, so it's a distributed search.
>> 
>> If I do a search for the string "bogus text" using edismax on two fields, I 
>> get an entry in each of the shard's filter caches that looks like:
>> 
>> item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2):
>> 
>> Is this expected?
>> 
>> I have a similar situation happening during faceted search, even though my 
>> fields are single-value/untokenized strings, and I'm not using the enum 
>> facet method.
>> 
>> But I'll get many, many entries in the filterCache for facet values, and 
>> they all look like "item_::"
>> 
>> The net result of the above is that even with a very big filterCache size of 
>> 2K, the hit ratio is still only 60%.
>> 
>> Thanks for any insights,
>> 
>> -- Ken

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Re: Default query operator "OR" wont work in some cases

2013-08-26 Thread smanad
I am not searching for phrase query, I am not sure why it shows up in
parsedquery.

  0
  3
  
true
true
egg salad

1377569284170
xml
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html
Sent from the Solr - User mailing list archive at Nabble.com.


ANNOUNCE: Lucene/Solr Revolution EU 2013: Registration & Community Voting

2013-08-26 Thread Chris Hostetter


(NOTE: cross-posted to various lists, please reply only to general@lucene 
w/ any questions or follow ups)



2 Announcements folks should be aware of regarding the upcoming 
Lucene/Solr Revolution EU 2013 in Dublin...



# 1) Registration Now Open

Registration is now open for Lucene/Solr Revolution EU 2013, the biggest 
open source conference dedicated to Apache Lucene/Solr.  Two-day training 
workshops will precede the conference.  You can benefit from discounted 
conference rates if you register early.


http://lucenerevolution.org/registration

More info...
http://searchhub.org/2013/08/15/lucenesolr-revolution-eu-registration-is-open/


# 2) Community Voting on Agenda (Until September 9th)

The Lucene/Solr Revolution free voting system allows you to vote on your 
favorite topics. The sessions that receive the highest number of votes 
will be automatically added to the Lucene/Solr Revolution EU 2013 agenda. 
The remaining sessions will be selected by a committee of industry experts 
who will take into account the community’s votes as well as their own 
expertise in the area.


http://lucenerevolution.org/2013/call-for-papers-survey

More info...
http://searchhub.org/2013/08/23/help-us-set-the-agenda-for-lucenesolr-revolution-eu/

-Hoss

Re: Default query operator "OR" wont work in some cases

2013-08-26 Thread Jack Krupansky
The phrase "egg salad" does not occur in your input. And, quoted phrases are 
an implicit "AND", not an "OR". Either you wanted "egg" and "salad" but not 
as a phrase, or as a very loose sloppy phrase, such as "egg salad"~10.


Or, who knows what you really want - your requirements are expressed too 
imprecisely.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 8:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Default query operator "OR" wont work in some cases

here is keywords field for 3 docs,

"Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger
Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl"

"Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs"

"DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce"

Here is my debug query:
(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1)
DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2)
DisjunctionMaxQuery((keywords:"egg salad")~0.1) /no_coord

Here is my fieldtype definition for keywords,
   
 
   
   
   
   
   
   
 
 
   
   
   
   
   
   
 
   




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Amit Jha
Hi,

I would suggest for the following. 

1. Create custom search connectors for each individual sources.
2. Connector will responsible to query the source of any type web, gateways 
etc. and get the results & write the top N results to a solr.
3. Query the same keyword to solr and display the result. 

Would you like to create something like
http://knimbus.com


Rgds
AJ

On 27-Aug-2013, at 2:28, Dan Davis  wrote:

> One more question here - is this topic more appropriate to a different list?
> 
> 
> On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis  wrote:
> 
>> I have now come to the task of estimating man-days to add "Blended Search
>> Results" to Apache Solr.   The argument has been made that this is not
>> desirable (see Jonathan Rochkind's blog entries on Bento search with
>> blacklight).   But the estimate remains.No estimate is worth much
>> without a design.   So, I am come to the difficult of estimating this
>> without having an in-depth knowledge of the Apache core.   Here is my
>> design, likely imperfect, as it stands.
>> 
>>   - Configure a core specific to each search source (local or remote)
>>   - On cores that index remote content, implement a periodic delete
>>   query that deletes documents whose timestamp is too old
>>   - Implement a custom requestHandler for the "remote" cores that goes
>>   out and queries the remote source.   For each result in the top N
>>   (configurable), it computes an id that is stable (e.g. it is based on the
>>   remote resource URL, doi, or hash of data returned).   It uses that id to
>>   look-up the document in the lucene database.   If the data is not there, it
>>   updates the lucene core and sets a flag that commit is required.   Once it
>>   is done, it commits if needed.
>>   - Configure a core that uses a custom SearchComponent to call the
>>   requestHandler that goes and gets new documents and commits them.   Since
>>   the cores for remote content are different cores, they can restart their
>>   searcher at this point if any commit is needed.   The custom
>>   SearchComponent will wait for commit and reload to be completed.   Then,
>>   search continues uses the other cores as "shards".
>>   - Auto-warming on this will assure that the most recently requested
>>   data is present.
>> 
>> It will, of course, be very slow a good part of the time.
>> 
>> Erik and others, I need to know whether this design has legs and what
>> other alternatives I might consider.
>> 
>> 
>> 
>> On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
>> wrote:
>> 
>>> The lack of global TF/IDF has been answered in the past,
>>> in the sharded case, by "usually you have similar enough
>>> stats that it doesn't matter". This pre-supposes a fairly
>>> evenly distributed set of documents.
>>> 
>>> But if you're talking about federated search across different
>>> types of documents, then what would you "rescore" with?
>>> How would you even consider scoring docs that are somewhat/
>>> totally different? Think magazine articles an meta-data associated
>>> with pictures.
>>> 
>>> What I've usually found is that one can use grouping to show
>>> the top N of a variety of results. Or show tabs with different
>>> types. Or have the app intelligently combine the different types
>>> of documents in a way that "makes sense". But I don't know
>>> how you'd just get "the right thing" to happen with some kind
>>> of scoring magic.
>>> 
>>> Best
>>> Erick
>>> 
>>> 
>>> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>>> 
 I've thought about it, and I have no time to really do a meta-search
 during
 evaluation.  What I need to do is to create a single core that contains
 both of my data sets, and then describe the architecture that would be
 required to do blended results, with liberal estimates.
 
 From the perspective of evaluation, I need to understand whether any of
 the
 solutions to better ranking in the absence of global IDF have been
 explored?I suspect that one could retrieve a much larger than N set
 of
 results from a set of shards, re-score in some way that doesn't require
 IDF, e.g. storing both results in the same priority queue and
 *re-scoring*
 before *re-ranking*.
 
 The other way to do this would be to have a custom SearchHandler that
 works
 differently - it performs the query, retries all results deemed relevant
 by
 another engine, adds them to the Lucene index, and then performs the
 query
 again in the standard way.   This would be quite slow, but perhaps useful
 as a way to evaluate my method.
 
 I still welcome any suggestions on how such a SearchHandler could be
 implemented.
>> 


Re: Default query operator "OR" wont work in some cases

2013-08-26 Thread smanad
here is keywords field for 3 docs, 

"Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger
Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl"

"Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs"

"DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce"

Here is my debug query:
(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1)
DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2)
DisjunctionMaxQuery((keywords:"egg salad")~0.1) /no_coord

Here is my fieldtype definition for keywords,

  






  
  






  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-user@lucene.apache.org

2013-08-26 Thread Erick Erickson
First thing to do is attach &query=debug to your queries and look at the
parsed output.

Second thing to do is look at the admin/analysis page and see what happens
at index and query time to things like o'reilly. You have
WordDelimiterFilterFactory
configured in your query but not index analysis chain. My bet on that is
that
you're getting different tokens at query and index time...

Third thing is that you need to escape the & character. It's probably being
interpreted as a delimiter on the URL and Solr ignores params it doesn't
understand.

Best
Erick


On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar wrote:

> Some of the queries (not all) with special chars return no documents.
>
> Example: queries returning no documents
> q=m&m (this can be explained, when I search for "m m", no documents are
> returned)
> q=o'reilly (when I search for "o reilly", I get documents back)
>
>
> Queries returning documents:
> q=hello&world (document matched is "Hello World: A Life in Ham Radio")
>
>
> My questions are:
> 1. What's wrong with "o'reilly"? What changes do I need in my field type?
> 2. How can I make the query "m&m" work?
> My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy
> Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1 Bag
> (42 oz)"
>
>
> FIeld type:
>  positionIncrementGap="100">
>  
>   
>words="stopwords.txt" enablePositionIncrements="true" />
>   
>   
>   
>   
> 
> 
>generateWordParts="1" generateNumberParts="1"
>
> catenateWords="1"
>
> catenateNumbers="1"
>
> catenateAll="0"
>
> preserveOriginal="1"/>
>   
>words="stopwords.txt" enablePositionIncrements="true" />
>   
>   
>   
>   
> 
> 
>
>
> --
> Thanks,
> -Utkarsh
>


solr-user@lucene.apache.org

2013-08-26 Thread Utkarsh Sengar
Some of the queries (not all) with special chars return no documents.

Example: queries returning no documents
q=m&m (this can be explained, when I search for "m m", no documents are
returned)
q=o'reilly (when I search for "o reilly", I get documents back)


Queries returning documents:
q=hello&world (document matched is "Hello World: A Life in Ham Radio")


My questions are:
1. What's wrong with "o'reilly"? What changes do I need in my field type?
2. How can I make the query "m&m" work?
My indexe has a bunch of M&M's docs like: "M & M's Milk Chocolate Candy
Coated Peanuts  19.2 oz" and ""M and Ms Chocolate Candies - Peanut - 1 Bag
(42 oz)"


FIeld type:

 
  
  
  
  
  
  


  
  
  
  
  
  
  




-- 
Thanks,
-Utkarsh


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis
One more question here - is this topic more appropriate to a different list?


On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis  wrote:

> I have now come to the task of estimating man-days to add "Blended Search
> Results" to Apache Solr.   The argument has been made that this is not
> desirable (see Jonathan Rochkind's blog entries on Bento search with
> blacklight).   But the estimate remains.No estimate is worth much
> without a design.   So, I am come to the difficult of estimating this
> without having an in-depth knowledge of the Apache core.   Here is my
> design, likely imperfect, as it stands.
>
>- Configure a core specific to each search source (local or remote)
>- On cores that index remote content, implement a periodic delete
>query that deletes documents whose timestamp is too old
>- Implement a custom requestHandler for the "remote" cores that goes
>out and queries the remote source.   For each result in the top N
>(configurable), it computes an id that is stable (e.g. it is based on the
>remote resource URL, doi, or hash of data returned).   It uses that id to
>look-up the document in the lucene database.   If the data is not there, it
>updates the lucene core and sets a flag that commit is required.   Once it
>is done, it commits if needed.
>- Configure a core that uses a custom SearchComponent to call the
>requestHandler that goes and gets new documents and commits them.   Since
>the cores for remote content are different cores, they can restart their
>searcher at this point if any commit is needed.   The custom
>SearchComponent will wait for commit and reload to be completed.   Then,
>search continues uses the other cores as "shards".
>- Auto-warming on this will assure that the most recently requested
>data is present.
>
> It will, of course, be very slow a good part of the time.
>
> Erik and others, I need to know whether this design has legs and what
> other alternatives I might consider.
>
>
>
> On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
> wrote:
>
>> The lack of global TF/IDF has been answered in the past,
>> in the sharded case, by "usually you have similar enough
>> stats that it doesn't matter". This pre-supposes a fairly
>> evenly distributed set of documents.
>>
>> But if you're talking about federated search across different
>> types of documents, then what would you "rescore" with?
>> How would you even consider scoring docs that are somewhat/
>> totally different? Think magazine articles an meta-data associated
>> with pictures.
>>
>> What I've usually found is that one can use grouping to show
>> the top N of a variety of results. Or show tabs with different
>> types. Or have the app intelligently combine the different types
>> of documents in a way that "makes sense". But I don't know
>> how you'd just get "the right thing" to happen with some kind
>> of scoring magic.
>>
>> Best
>> Erick
>>
>>
>> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>>
>>> I've thought about it, and I have no time to really do a meta-search
>>> during
>>> evaluation.  What I need to do is to create a single core that contains
>>> both of my data sets, and then describe the architecture that would be
>>> required to do blended results, with liberal estimates.
>>>
>>> From the perspective of evaluation, I need to understand whether any of
>>> the
>>> solutions to better ranking in the absence of global IDF have been
>>> explored?I suspect that one could retrieve a much larger than N set
>>> of
>>> results from a set of shards, re-score in some way that doesn't require
>>> IDF, e.g. storing both results in the same priority queue and
>>> *re-scoring*
>>> before *re-ranking*.
>>>
>>> The other way to do this would be to have a custom SearchHandler that
>>> works
>>> differently - it performs the query, retries all results deemed relevant
>>> by
>>> another engine, adds them to the Lucene index, and then performs the
>>> query
>>> again in the standard way.   This would be quite slow, but perhaps useful
>>> as a way to evaluate my method.
>>>
>>> I still welcome any suggestions on how such a SearchHandler could be
>>> implemented.
>>>
>>
>>
>


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis
First answer:

My employer is a library and do not have the license to harvest everything
indexed by a "web-scale discovery service" such as PRIMO or Summon.If
our design automatically relays searches entered by users, and then
periodically purges results, I think it is reasonable from a licensing
perspective.

Second answer:

What if you wanted your Apache Solr powered search to include all results
from Google scholar to any query?   Do you think you could easily or
cheaply configure a Zookeeper cluster large enough to harvest and index all
of Google Scholar?   Would that violate robot rules?Is it even possible
to do this from an API perspective?   Wouldn't google notice?

Third answer:

On Gartner's 2013 Enterprise Search Magic Quadrant, LucidWorks and the
other Enterprise Search firm based on Apache Solr were dinged on the lack
of Federated Search.  I do not have the hubris to think I can fix that, and
it is not really my role to try, but something that works without
Harvesting and local indexing is obviously desirable to Enterprise Search
users.



On Mon, Aug 26, 2013 at 4:46 PM, Paul Libbrecht  wrote:

>
> Why not simply create a meta search engine that indexes everything of each
> of the nodes.?
> (I think one calls this harvesting)
>
> I believe that this the way to avoid all sorts of performance bottleneck.
> As far as I could analyze, the performance of a federated search is the
> performance of the least speedy node; which can turn to be quite bad if you
> do not exercise guarantees of remote sources.
>
> Or are the "remote cores" below actually things that you manage on your
> side? If yes guarantees are easy to manage..
>
> Paul
>
>
> Le 26 août 2013 à 22:38, Dan Davis a écrit :
>
> > I have now come to the task of estimating man-days to add "Blended Search
> > Results" to Apache Solr.   The argument has been made that this is not
> > desirable (see Jonathan Rochkind's blog entries on Bento search with
> > blacklight).   But the estimate remains.No estimate is worth much
> > without a design.   So, I am come to the difficult of estimating this
> > without having an in-depth knowledge of the Apache core.   Here is my
> > design, likely imperfect, as it stands.
> >
> >   - Configure a core specific to each search source (local or remote)
> >   - On cores that index remote content, implement a periodic delete query
> >   that deletes documents whose timestamp is too old
> >   - Implement a custom requestHandler for the "remote" cores that goes
> out
> >   and queries the remote source.   For each result in the top N
> >   (configurable), it computes an id that is stable (e.g. it is based on
> the
> >   remote resource URL, doi, or hash of data returned).   It uses that id
> to
> >   look-up the document in the lucene database.   If the data is not
> there, it
> >   updates the lucene core and sets a flag that commit is required.
> Once it
> >   is done, it commits if needed.
> >   - Configure a core that uses a custom SearchComponent to call the
> >   requestHandler that goes and gets new documents and commits them.
> Since
> >   the cores for remote content are different cores, they can restart
> their
> >   searcher at this point if any commit is needed.   The custom
> >   SearchComponent will wait for commit and reload to be completed.
> Then,
> >   search continues uses the other cores as "shards".
> >   - Auto-warming on this will assure that the most recently requested
> data
> >   is present.
> >
> > It will, of course, be very slow a good part of the time.
> >
> > Erik and others, I need to know whether this design has legs and what
> other
> > alternatives I might consider.
> >
> >
> >
> > On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson  >wrote:
> >
> >> The lack of global TF/IDF has been answered in the past,
> >> in the sharded case, by "usually you have similar enough
> >> stats that it doesn't matter". This pre-supposes a fairly
> >> evenly distributed set of documents.
> >>
> >> But if you're talking about federated search across different
> >> types of documents, then what would you "rescore" with?
> >> How would you even consider scoring docs that are somewhat/
> >> totally different? Think magazine articles an meta-data associated
> >> with pictures.
> >>
> >> What I've usually found is that one can use grouping to show
> >> the top N of a variety of results. Or show tabs with different
> >> types. Or have the app intelligently combine the different types
> >> of documents in a way that "makes sense". But I don't know
> >> how you'd just get "the right thing" to happen with some kind
> >> of scoring magic.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
> >>
> >>> I've thought about it, and I have no time to really do a meta-search
> >>> during
> >>> evaluation.  What I need to do is to create a single core that contains
> >>> both of my data sets, and then describe the architecture that would be
> >>> required to do blended

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI
EOF exception seems like a generic exception for me. I should find the
underlying problem within my infrastructure.

26 Ağustos 2013 Pazartesi tarihinde Walter Underwood 
adlı kullanıcı şöyle yazdı:
> We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr
heap is 8GB. We have several cores, totaling about 14GB on disk. This
configuration allows 100% of the indexes to be in file buffers.
>
> wunder
>
> On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote:
>
>> Hi Walter;
>>
>> You said you are caching your documents. What is average Physical Memory
>> usage of your Solr Nodes?
>>
>>
>> 2013/8/26 Walter Underwood 
>>
>>> It looks lik that error happens when reading XML from an HTTP request.
The
>>> XML ends too soon. This should be unrelated to file buffers.
>>>
>>> wunder
>>>
>>> On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:
>>>
 It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
>>> have
 CentOS 6.4. While indexing I got that error and I am suspicious about
>>> that
 it is because of high percentage of Physical Memory usage.

 ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
 java.lang.RuntimeException: [was class
org.eclipse.jetty.io.EofException]
 early EOF
 at

>>>
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
 at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
 at

>>>
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
 at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
 at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
 at

>>>
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
 at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
 at

>>>
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
 at

>>>
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
 at

>>>
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
 at

>>>
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
 at

>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
 at

>>>
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
 at

>>>
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
 at

>>>
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 at

>>>
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at

>>>
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
 at

>>>
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at

>>>
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
 at
>>>
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
 at

>>>
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at

>>>
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
 at

>>>
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at

>>>
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at

>>>
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at

>>> org.eclipse.jetty.server.handler.--
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>


Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Paul Libbrecht

Why not simply create a meta search engine that indexes everything of each of 
the nodes.?
(I think one calls this harvesting)

I believe that this the way to avoid all sorts of performance bottleneck.
As far as I could analyze, the performance of a federated search is the 
performance of the least speedy node; which can turn to be quite bad if you do 
not exercise guarantees of remote sources.

Or are the "remote cores" below actually things that you manage on your side? 
If yes guarantees are easy to manage..

Paul


Le 26 août 2013 à 22:38, Dan Davis a écrit :

> I have now come to the task of estimating man-days to add "Blended Search
> Results" to Apache Solr.   The argument has been made that this is not
> desirable (see Jonathan Rochkind's blog entries on Bento search with
> blacklight).   But the estimate remains.No estimate is worth much
> without a design.   So, I am come to the difficult of estimating this
> without having an in-depth knowledge of the Apache core.   Here is my
> design, likely imperfect, as it stands.
> 
>   - Configure a core specific to each search source (local or remote)
>   - On cores that index remote content, implement a periodic delete query
>   that deletes documents whose timestamp is too old
>   - Implement a custom requestHandler for the "remote" cores that goes out
>   and queries the remote source.   For each result in the top N
>   (configurable), it computes an id that is stable (e.g. it is based on the
>   remote resource URL, doi, or hash of data returned).   It uses that id to
>   look-up the document in the lucene database.   If the data is not there, it
>   updates the lucene core and sets a flag that commit is required.   Once it
>   is done, it commits if needed.
>   - Configure a core that uses a custom SearchComponent to call the
>   requestHandler that goes and gets new documents and commits them.   Since
>   the cores for remote content are different cores, they can restart their
>   searcher at this point if any commit is needed.   The custom
>   SearchComponent will wait for commit and reload to be completed.   Then,
>   search continues uses the other cores as "shards".
>   - Auto-warming on this will assure that the most recently requested data
>   is present.
> 
> It will, of course, be very slow a good part of the time.
> 
> Erik and others, I need to know whether this design has legs and what other
> alternatives I might consider.
> 
> 
> 
> On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson 
> wrote:
> 
>> The lack of global TF/IDF has been answered in the past,
>> in the sharded case, by "usually you have similar enough
>> stats that it doesn't matter". This pre-supposes a fairly
>> evenly distributed set of documents.
>> 
>> But if you're talking about federated search across different
>> types of documents, then what would you "rescore" with?
>> How would you even consider scoring docs that are somewhat/
>> totally different? Think magazine articles an meta-data associated
>> with pictures.
>> 
>> What I've usually found is that one can use grouping to show
>> the top N of a variety of results. Or show tabs with different
>> types. Or have the app intelligently combine the different types
>> of documents in a way that "makes sense". But I don't know
>> how you'd just get "the right thing" to happen with some kind
>> of scoring magic.
>> 
>> Best
>> Erick
>> 
>> 
>> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>> 
>>> I've thought about it, and I have no time to really do a meta-search
>>> during
>>> evaluation.  What I need to do is to create a single core that contains
>>> both of my data sets, and then describe the architecture that would be
>>> required to do blended results, with liberal estimates.
>>> 
>>> From the perspective of evaluation, I need to understand whether any of
>>> the
>>> solutions to better ranking in the absence of global IDF have been
>>> explored?I suspect that one could retrieve a much larger than N set of
>>> results from a set of shards, re-score in some way that doesn't require
>>> IDF, e.g. storing both results in the same priority queue and *re-scoring*
>>> before *re-ranking*.
>>> 
>>> The other way to do this would be to have a custom SearchHandler that
>>> works
>>> differently - it performs the query, retries all results deemed relevant
>>> by
>>> another engine, adds them to the Lucene index, and then performs the query
>>> again in the standard way.   This would be quite slow, but perhaps useful
>>> as a way to evaluate my method.
>>> 
>>> I still welcome any suggestions on how such a SearchHandler could be
>>> implemented.
>>> 
>> 
>> 



Re: More on topic of Meta-search/Federated Search with Solr

2013-08-26 Thread Dan Davis
I have now come to the task of estimating man-days to add "Blended Search
Results" to Apache Solr.   The argument has been made that this is not
desirable (see Jonathan Rochkind's blog entries on Bento search with
blacklight).   But the estimate remains.No estimate is worth much
without a design.   So, I am come to the difficult of estimating this
without having an in-depth knowledge of the Apache core.   Here is my
design, likely imperfect, as it stands.

   - Configure a core specific to each search source (local or remote)
   - On cores that index remote content, implement a periodic delete query
   that deletes documents whose timestamp is too old
   - Implement a custom requestHandler for the "remote" cores that goes out
   and queries the remote source.   For each result in the top N
   (configurable), it computes an id that is stable (e.g. it is based on the
   remote resource URL, doi, or hash of data returned).   It uses that id to
   look-up the document in the lucene database.   If the data is not there, it
   updates the lucene core and sets a flag that commit is required.   Once it
   is done, it commits if needed.
   - Configure a core that uses a custom SearchComponent to call the
   requestHandler that goes and gets new documents and commits them.   Since
   the cores for remote content are different cores, they can restart their
   searcher at this point if any commit is needed.   The custom
   SearchComponent will wait for commit and reload to be completed.   Then,
   search continues uses the other cores as "shards".
   - Auto-warming on this will assure that the most recently requested data
   is present.

It will, of course, be very slow a good part of the time.

Erik and others, I need to know whether this design has legs and what other
alternatives I might consider.



On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson wrote:

> The lack of global TF/IDF has been answered in the past,
> in the sharded case, by "usually you have similar enough
> stats that it doesn't matter". This pre-supposes a fairly
> evenly distributed set of documents.
>
> But if you're talking about federated search across different
> types of documents, then what would you "rescore" with?
> How would you even consider scoring docs that are somewhat/
> totally different? Think magazine articles an meta-data associated
> with pictures.
>
> What I've usually found is that one can use grouping to show
> the top N of a variety of results. Or show tabs with different
> types. Or have the app intelligently combine the different types
> of documents in a way that "makes sense". But I don't know
> how you'd just get "the right thing" to happen with some kind
> of scoring magic.
>
> Best
> Erick
>
>
> On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis  wrote:
>
>> I've thought about it, and I have no time to really do a meta-search
>> during
>> evaluation.  What I need to do is to create a single core that contains
>> both of my data sets, and then describe the architecture that would be
>> required to do blended results, with liberal estimates.
>>
>> From the perspective of evaluation, I need to understand whether any of
>> the
>> solutions to better ranking in the absence of global IDF have been
>> explored?I suspect that one could retrieve a much larger than N set of
>> results from a set of shards, re-score in some way that doesn't require
>> IDF, e.g. storing both results in the same priority queue and *re-scoring*
>> before *re-ranking*.
>>
>> The other way to do this would be to have a custom SearchHandler that
>> works
>> differently - it performs the query, retries all results deemed relevant
>> by
>> another engine, adds them to the Lucene index, and then performs the query
>> again in the standard way.   This would be quite slow, but perhaps useful
>> as a way to evaluate my method.
>>
>> I still welcome any suggestions on how such a SearchHandler could be
>> implemented.
>>
>
>


Re: Adding one core to an existing core?

2013-08-26 Thread Bruno Mannina

ok thanks !

Le 26/08/2013 17:52, Jack Krupansky a écrit :

Unfortunately, there is no -Dcore property, so you have to due -Durl -

java -Durl=http://localhost:8983/solr/collection2/update ... -jar 
post.jar ...


You have the proper /select syntax.

-- Jack Krupansky

-Original Message- From: Bruno Mannina
Sent: Monday, August 26, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding one core to an existing core?

Dear Solr User,

now I have 2 cores "collection1" "collection2"

Default collection is the "Collection1"

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected
core?
http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on 



I mean by default is the collection1, if I want "collection2" I use the
link:
http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on 



Is exist a param &core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent 
from the already existing core(s). So basically you don't need to 
reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding 
to the "instanceDir" attribute). I said one or two because if the 
indexes configuration is basically the same (or something changes but 
is dynamically configured - i.e. core name) you can create two 
instances starting from the same configuration. I mean



 
  
  
 


Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case 
each core will have its own instanceDir.



 
  
  
 


Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno
















Re: Default query operator "OR" wont work in some cases

2013-08-26 Thread Erick Erickson
Try adding &debug=query to your URL, that'll show you
how the parsing actually happened and should give you
some pointers.

Best,
Erick


On Mon, Aug 26, 2013 at 9:55 AM, smanad  wrote:

> Hi,
>
> I have some documents with keywords "egg" and some with "salad" and some
> with "egg salad".
> When I search for egg salad, I expect to see egg results + salad results. I
> dont see them.
> egg and salad queries individually work fine.
> I am using whitespacetokenizer.
>
> Not sure if I am missing something.
> Thanks,
> -Manasi
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread Erick Erickson
What is a "select" analyzer type? Never seen one of those before
or I'm just blanking

Either of those types should work for case-insensitive search, did
you re-index?

And please don't hijack threads, start a new subject with new
questions.

Best
Erick



On Mon, Aug 26, 2013 at 7:42 AM, skorrapa wrote:

> I have also re indexed the data and tried. And also tried with the belowl
>sortMissingLast="true" omitNorms="true">
>   
> 
> 
>   
> 
> 
> 
>   
> 
> 
> 
>   
> 
> This didnt work as well...
>
>
>
> On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] <
> ml-node+s472066n4086601...@n3.nabble.com> wrote:
>
> > Hello All,
> >
> > I am still facing the same issue. Case insensitive search isnot working
> on
> > Solr 4.3
> > I am using the below configurations in schema.xml
> >  > sortMissingLast="true" omitNorms="true">
> >   
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > 
> > Basically I want my string which could have spaces or characters like '-'
> > or \ to be searched upon case insensitively.
> > Please help.
> >
> >
> > --
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
> >  To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4081896&code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc=
> >
> > .
> > NAML<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Grouping

2013-08-26 Thread tvellore
I'm getting the same error...Is there any workaround to this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Grouping-tp2820116p4086674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Master / Slave Set Up Documentation

2013-08-26 Thread Jared Griffith
Ha, I guess I didn't see that page listed in the Table of contents 
it's definitely Monday.  Thanks.


On Mon, Aug 26, 2013 at 10:36 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

> You mean this
>
> http://wiki.apache.org/solr/SolrReplication
>
> ?
>
> What's wrong with this page? It seems clear.
> I'm widely using replication and the first time I set up a 1 master + 2
> slaves by simply following that page
> On 26 Aug 2013 18:54, "Jared Griffith"  wrote:
>
> > Hello,
> > I'm new to this Solr thing, and I was wondering if there is any good /
> > solid documentation on setting up and running replication.  I'm going
> > through the Wiki but I am not seeing anything that is obvious there.
> >
> > --
> >
> > Jared Griffith
> > Linux Administrator, PICS Auditing, LLC
> > P: (949) 936-4574
> > C: (909) 653-7814
> >
> > 
> >
> > 17701 Cowan #140 | Irvine, CA | 92614
> >
> > Join PICS on LinkedIn and Twitter!
> >
> > 
> >
>



-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814



17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!




Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood
We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr heap is 
8GB. We have several cores, totaling about 14GB on disk. This configuration 
allows 100% of the indexes to be in file buffers.

wunder

On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote:

> Hi Walter;
> 
> You said you are caching your documents. What is average Physical Memory
> usage of your Solr Nodes?
> 
> 
> 2013/8/26 Walter Underwood 
> 
>> It looks lik that error happens when reading XML from an HTTP request. The
>> XML ends too soon. This should be unrelated to file buffers.
>> 
>> wunder
>> 
>> On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:
>> 
>>> It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
>> have
>>> CentOS 6.4. While indexing I got that error and I am suspicious about
>> that
>>> it is because of high percentage of Physical Memory usage.
>>> 
>>> ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
>>> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
>>> early EOF
>>> at
>>> 
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>>> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>>> at
>>> 
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>>> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>>> at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
>>> at
>>> 
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
>>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>>> at
>>> 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>>> at
>>> 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>>> at
>>> 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
>>> at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
>>> at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
>>> at
>>> 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
>>> at
>>> 
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>>> at
>>> 
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>> at
>>> 
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>>> at
>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>>> at
>>> 
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>>> at
>>> 
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>>> at org.eclipse.jetty.server.Server.handle(Server.java:365)
>>> at
>>> 
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>>> at
>>> 
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>>> at
>>> 
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>>> at
>>> 
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>>> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
>>> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
>>> at
>>> 
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>>> at
>>> 
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>>> at
>>> 
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>> at
>>> 
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>>> at java.lang.Thread.run(Thread.java:722)
>>> Caused by: org.eclipse.jetty.io.EofException: early EOF
>>> at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
>>> at java.io.InputStream.read(InputStream.java:101)
>>> at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
>>> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
>>> at com.ctc.wstx.io.MergedReader.read(MergedReader.java

Re: Master / Slave Set Up Documentation

2013-08-26 Thread Andrea Gazzarini
You mean this

http://wiki.apache.org/solr/SolrReplication

?

What's wrong with this page? It seems clear.
I'm widely using replication and the first time I set up a 1 master + 2
slaves by simply following that page
On 26 Aug 2013 18:54, "Jared Griffith"  wrote:

> Hello,
> I'm new to this Solr thing, and I was wondering if there is any good /
> solid documentation on setting up and running replication.  I'm going
> through the Wiki but I am not seeing anything that is obvious there.
>
> --
>
> Jared Griffith
> Linux Administrator, PICS Auditing, LLC
> P: (949) 936-4574
> C: (909) 653-7814
>
> 
>
> 17701 Cowan #140 | Irvine, CA | 92614
>
> Join PICS on LinkedIn and Twitter!
>
> 
>


Re: Adding one core to an existing core?

2013-08-26 Thread Jack Krupansky

Unfortunately, there is no -Dcore property, so you have to due -Durl -

java -Durl=http://localhost:8983/solr/collection2/update ... -jar post.jar 
...


You have the proper /select syntax.

-- Jack Krupansky

-Original Message- 
From: Bruno Mannina

Sent: Monday, August 26, 2013 9:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding one core to an existing core?

Dear Solr User,

now I have 2 cores "collection1" "collection2"

Default collection is the "Collection1"

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected
core?
http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

I mean by default is the collection1, if I want "collection2" I use the
link:
http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

Is exist a param &core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from the 
already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you must 
have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to the 
"instanceDir" attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean



 
  
  
 


Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the current 
core), you just need to have another conf dir with solrconfig.xml, 
schema.xml and other required files. In this case each core will have its 
own instanceDir.



 
  
  
 


Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add one 
another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno












Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI
Hi Walter;

You said you are caching your documents. What is average Physical Memory
usage of your Solr Nodes?


2013/8/26 Walter Underwood 

> It looks lik that error happens when reading XML from an HTTP request. The
> XML ends too soon. This should be unrelated to file buffers.
>
> wunder
>
> On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:
>
> > It has a 48 GB of RAM and index size is nearly 100 GB at each node. I
> have
> > CentOS 6.4. While indexing I got that error and I am suspicious about
> that
> > it is because of high percentage of Physical Memory usage.
> >
> > ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> > early EOF
> > at
> >
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> > at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> > at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> > at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> > at
> >
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> > at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> > at
> >
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> > at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> > at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> > at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> > at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> > at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> > at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> > at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> > at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> > at org.eclipse.jetty.server.Server.handle(Server.java:365)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> > at
> >
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> > at
> >
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> > at
> >
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> > at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> > at java.lang.Thread.run(Thread.java:722)
> > Caused by: org.eclipse.jetty.io.EofException: early EOF
> > at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> > at java.io.InputStream.read(InputStream.java:101)
> > at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> > at
> >
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> > at
> >
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> > at
> >
> com.ctc.wstx.

Re: ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2013-08-26 Thread Shawn Heisey
On 8/26/2013 1:54 AM, zhaoxin wrote:
> Caused by: java.lang.ClassCastException

Generally when you get this kind of error with Solr, it means you have a
mix of old and new jars.  This might be from an upgrade, where either
the old war expansion doesn't get removed, or from unnecessarily
including jars on your classpath.  If you are using custom code or a
code patch, it probably needs changing for a new Solr version.

Thanks,
Shawn



Master / Slave Set Up Documentation

2013-08-26 Thread Jared Griffith
Hello,
I'm new to this Solr thing, and I was wondering if there is any good /
solid documentation on setting up and running replication.  I'm going
through the Wiki but I am not seeing anything that is obvious there.

-- 

Jared Griffith
Linux Administrator, PICS Auditing, LLC
P: (949) 936-4574
C: (909) 653-7814



17701 Cowan #140 | Irvine, CA | 92614

Join PICS on LinkedIn and Twitter!




Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood
It looks lik that error happens when reading XML from an HTTP request. The XML 
ends too soon. This should be unrelated to file buffers.

wunder

On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote:

> It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have
> CentOS 6.4. While indexing I got that error and I am suspicious about that
> it is because of high percentage of Physical Memory usage.
> 
> ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
> early EOF
> at
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> at
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
> at
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:365)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: org.eclipse.jetty.io.EofException: early EOF
> at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
> at java.io.InputStream.read(InputStream.java:101)
> at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
> at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
> at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
> at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
> at
> com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
> at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
> at
> com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
> at
> com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
> at
> com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
> at
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
> ... 36 more
> 
> 
> 
> 2013/8/26 Walter Underwood 
> 
>> What is the precise error? What kind of machine?
>> 
>> File buffers are a robust part of the OS. Unix h

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI
It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have
CentOS 6.4. While indexing I got that error and I am suspicious about that
it is because of high percentage of Physical Memory usage.

ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException;
java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException]
early EOF
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.eclipse.jetty.io.EofException: early EOF
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65)
at java.io.InputStream.read(InputStream.java:101)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 36 more



2013/8/26 Walter Underwood 

> What is the precise error? What kind of machine?
>
> File buffers are a robust part of the OS. Unix has had file buffer caching
> for decades.
>
> wunder
>
> On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote:
>
> > Hi Walter;
> >
> > You are right about performance. However when I index documents on a
> > machine that has  a high percentage of Physical Memory usage I get EOF
> > errors?
> >
> >
> > 2013/8/26 Walter Underwood 
> >
> >> On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:
> >>
> >>> S

Re: custom names for replicas in solrcloud

2013-08-26 Thread Jack Krupansky

No, it is part of the core admin API.


-- Jack Krupansky

-Original Message- 
From: smanad

Sent: Monday, August 26, 2013 10:02 AM
To: solr-user@lucene.apache.org
Subject: Re: custom names for replicas in solrcloud

Is coreNodeName exposed via collections api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html
Sent from the Solr - User mailing list archive at Nabble.com. 



RE: "Caused by: java.net.SocketException: Connection reset by peer: socket write error" solr querying

2013-08-26 Thread Greg Walters
AnilJayanti,

Have you checked your entire stack from the client all the way to solr along 
with anything between them? Your timeout values should match everywhere and if 
there's something between the client and server that'll timeout before either 
the client or server does it'll cause that error as well.

A quick google search shows similar causes:
http://stackoverflow.com/questions/13719645/comitted-before-500-null-error-in-solr-3-6-1
http://lucene.472066.n3.nabble.com/jetty-error-broken-pipe-td3522120.html

How long after the client sends a request does it take for that error to show 
up in the logs and what happens client side when you see the error?


-Original Message-
From: aniljayanti [mailto:aniljaya...@yahoo.co.in] 
Sent: Sunday, August 25, 2013 11:28 PM
To: solr-user@lucene.apache.org
Subject: RE: "Caused by: java.net.SocketException: Connection reset by peer: 
socket write error" solr querying

Hi Greg,

thanks for reply,

I tried to set the maxIdleTime to 30 milliSeconds. But still getting same 
error.

WARN  - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response;
Committed before 500 {msg=Connection reset by peer: socket write 
error,trace=org.eclipse.jetty.io.EofException
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Connection reset by peer: socket write 
error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)
at 
org.eclipse.jetty.io.bio.StreamEndP

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Walter Underwood
What is the precise error? What kind of machine?

File buffers are a robust part of the OS. Unix has had file buffer caching for 
decades.

wunder

On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote:

> Hi Walter;
> 
> You are right about performance. However when I index documents on a
> machine that has  a high percentage of Physical Memory usage I get EOF
> errors?
> 
> 
> 2013/8/26 Walter Underwood 
> 
>> On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:
>> 
>>> Sometimes Physical Memory usage of Solr is over %99 and this may cause
>>> problems. Do you run such kind of a command periodically:
>>> 
>>> sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
>>> 
>>> to force dropping caches of machine that Solr runs at and avoid problems?
>> 
>> 
>> This is a terrible idea. The OS automatically manages the file buffers.
>> When they are all used, that is a  good thing, because it reduced disk IO.
>> 
>> After this, no files will be cached in RAM. Every single read from a file
>> will have to go to disk. This will cause very slow performance until the
>> files are recached.
>> 
>> Recently, I did exactly the opposite to improve performance in our Solr
>> installation. Before starting the Solr process, a script reads every file
>> in the index so that it will already be in file buffers. This avoids
>> several minutes of high disk IO and slow performance after startup.
>> 
>> wunder
>> Search Guy, Chegg.com
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Can a data import handler grab all pages of an RSS feed?

2013-08-26 Thread eShard
Good morning,
I have an IBM Portal atom feed that spans multiple pages.
Is there a way to instruct the DIH to grab all available pages?
I can put a huge range in but that can be extremely slow with large amounts
of XML data.
I'm currently using Solr 4.0 final.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html
Sent from the Solr - User mailing list archive at Nabble.com.


autoCommit and autoSoftCommit

2013-08-26 Thread Bryan Bende
I'm running Solr 4.3 with:


  6
  false



  5000


When I start Solr and send in a couple of hundred documents, I am able to
retrieve documents after 5 seconds using SolrJ. However, from the Solr
admin console if I query for *:* it will show that there are docs in the
numFound attribute, but none of the results have the stored fields present.

As a test I also tried modifying the autoCommit to add maxDocs like this:

  100
  6
  false


It seems like with this configuration something different happens... if I
send in 150 docs then the first 100 will show up correctly through Solr
admin, but the last 50 that didn't hit the maxDocs threshold still don't
show the stored fields.

Is it expected that maxDocs and maxTime do something different when
commiting ?

If using autoCommit with openSearcher=false and autoSoftCommit, does the
client ever have to send a hard commit with openSearcher=true ?

- Bryan


Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini

On 08/26/2013 04:09 PM, Erick Erickson wrote:

right, edismax is much preferred, dismax hasn't been formally deprecated,
but almost nobody uses it...

Good to know...I basically use dismax in ALL my SOLR instances :D

I'd be really careful about adding whitespace to the list of escape chars
because it changes the semantics of the search. While it'll work for this
specific case, if you use it in other cases it will change the sense of the
query. This may be OK, but be careful, it might be better to do this
specifically on an as-needed basis...
Yes, that's the reason why I'm not really sure about what I did...I'm 
running my regression tests...all seems green...let's see

But you know your problem space best

Best,
Erick

Thank you very much

Best,
Gazza



On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:


Hi Erick,
sorry I forgot the SOLR version...is the 3.6.0

ClientUtils in that version does whitespace escaping:

   public static String escapeQueryChars(String s) {
 StringBuilder sb = new StringBuilder();
 for (int i = 0; i < s.length(); i++) {
   char c = s.charAt(i);
   // These characters are part of the query syntax and must be escaped
   if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || c
== ')' || c == ':'
 || c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c
== '}' || c == '~'
 || c == '*' || c == '?' || c == '|' || c == '&'  || c == ';'
 || Character.isWhitespace(c)) {
 sb.append('\\');
   }
   sb.append(c);
 }
 return sb.toString();
   }

Now, I solved the issue but not really sure about that.

Debugging the code I saw that the query string (on the SearchHandler)


978\ 90\ 04\ 23560\ 1

once passed through DismaxQueryParser (specifically through
SolrPluginUtils.partialEscape(**CharSequence)

becames


978\\ 90\\ 04\\ 23560\\ 1

because that method escapes the backslashes

So, using the eclipse debugger I removed at runtime the additional
backslash and it works perfectly but of course...I can't do that in
production for every search :D

So, just to try I changed dismax in edismax which, I saw, doesn't call
SolrPluginUtilsand it works perfectly!

I saw in your query string that you used edismax too...maybe is that the
point?

Many thanks
Andrea


On 08/26/2013 02:47 PM, Erick Erickson wrote:


Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars


That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/**select?wt=json&q=ab\cd\
ef&debug=query&defType=**edismax&qf=name eoe
produced this output..

 - parsedquery_toString: "+(eoe:abcdef | (name:ab name:cd name:ef))",



where the field "eoe" is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

  Hi Erick,

escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is
*978-90-04-23560-1*
- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.**

getParameter("q")));/

isbn is declared in this way


  
  


  
  
  


search handler is:

  
  
  *dismax*

  100%
  
*isbn_issn_search*^1
  
  
*isbn_issn_search*^10
  
  0
  0.1
  ...
  

This is what I get:

*1) 978-90-04-23560-1**
*path=/select params={start=0&q=*978\-90\-
04\-23560\-1*&sfield=&qt=any_*
*bc&wt=javabin&rows=10&**version=**2} *hits=1* status=0 QTime=5*

2) ***9789004235601*
*webapp=/solr path=/select params={start=0&q=***
9789004235601*&sfield=&qt=any_bc&wt=javabin&rows=10&**version=**2}

*hits=1* status=0 QTime=5*

3) **978 90 04 23560 1**
*path=/select params={start=0&*q=978\+90\+
04\+23560\+1*&sfield=&qt=any_*
*bc&wt=javabin&rows=10&**version=**2} *hits=0 *status=0 QTime=2*


*Extract from queryDebug=true:

978\ 90\ 04\ 23560\ 1
...
978\ 90\ 04\ 23560\ 1
978\ 90\ 04\ 23560\ 1
...

  +((DisjunctionMaxQuery((isbn_issn_search:*978*^1.0)~0.**
**1)
  DisjunctionMaxQuery((isbn_issn_search:*90*^1.0)~0.1)
  DisjunctionMaxQuery((isbn_issn_search:*04*^1.0)~0.1)
  DisjunctionMaxQuery((isbn_issn_search:*23560*^1.0)~0.1)
  DisjunctionMaxQuery((isbn_issn_search:*1*^1.0)~0.1))~5)
  DisjunctionMaxQuery((isbn_issn_search:*9789004235601*^**
10.0)~0.1)


--

Re: Tokenization at query time

2013-08-26 Thread Erick Erickson
right, edismax is much preferred, dismax hasn't been formally deprecated,
but almost nobody uses it...

I'd be really careful about adding whitespace to the list of escape chars
because it changes the semantics of the search. While it'll work for this
specific case, if you use it in other cases it will change the sense of the
query. This may be OK, but be careful, it might be better to do this
specifically on an as-needed basis...

But you know your problem space best

Best,
Erick


On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

> Hi Erick,
> sorry I forgot the SOLR version...is the 3.6.0
>
> ClientUtils in that version does whitespace escaping:
>
>   public static String escapeQueryChars(String s) {
> StringBuilder sb = new StringBuilder();
> for (int i = 0; i < s.length(); i++) {
>   char c = s.charAt(i);
>   // These characters are part of the query syntax and must be escaped
>   if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || c
> == ')' || c == ':'
> || c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c
> == '}' || c == '~'
> || c == '*' || c == '?' || c == '|' || c == '&'  || c == ';'
> || Character.isWhitespace(c)) {
> sb.append('\\');
>   }
>   sb.append(c);
> }
> return sb.toString();
>   }
>
> Now, I solved the issue but not really sure about that.
>
> Debugging the code I saw that the query string (on the SearchHandler)
>
>
> 978\ 90\ 04\ 23560\ 1
>
> once passed through DismaxQueryParser (specifically through
> SolrPluginUtils.partialEscape(**CharSequence)
>
> becames
>
>
> 978\\ 90\\ 04\\ 23560\\ 1
>
> because that method escapes the backslashes
>
> So, using the eclipse debugger I removed at runtime the additional
> backslash and it works perfectly but of course...I can't do that in
> production for every search :D
>
> So, just to try I changed dismax in edismax which, I saw, doesn't call
> SolrPluginUtilsand it works perfectly!
>
> I saw in your query string that you used edismax too...maybe is that the
> point?
>
> Many thanks
> Andrea
>
>
> On 08/26/2013 02:47 PM, Erick Erickson wrote:
>
>> Andrea:
>>
>> Works for me, admittedly through the browser
>>
>> I suspect the problem is here: ClientUtils.**escapeQueryChars
>>
>>
>> That doesn't do anything about escaping the spaces, it just handles
>> characters that have special meaning to the query syntax, things like +-
>> etc.
>>
>> Using your field definition, this:
>> http://localhost:8983/solr/**select?wt=json&q=ab\cd\
>> ef&debug=query&defType=**edismax&qf=name eoe
>> produced this output..
>>
>> - parsedquery_toString: "+(eoe:abcdef | (name:ab name:cd name:ef))",
>>
>>
>>
>> where the field "eoe" is your isbn_issn type.
>>
>> Best,
>> Erick
>>
>>
>> On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini <
>> andrea.gazzar...@gmail.com> wrote:
>>
>>  Hi Erick,
>>> escaping spaces doesn't work...
>>>
>>> Briefly,
>>>
>>> - In a document I have an ISBN field that (stored value) is
>>> *978-90-04-23560-1*
>>> - In the index I have this value: *9789004235601*
>>>
>>> Now, I want be able to search the document by using:
>>>
>>> 1) q=*978-90-04-23560-1*
>>> 2) q=*978 90 04 23560 1*
>>> 3) q=*9789004235601*
>>>
>>> 1 and 3 works perfectly, 2 doesn't work.
>>>
>>> My code is:
>>>
>>> /SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.**
>>>
>>> getParameter("q")));/
>>>
>>> isbn is declared in this way
>>>
>>> >> positionIncrementGap="100">
>>>  
>>>  
>>>
>>>
>>>  
>>>  >> generateWordParts="0" generateNumberParts="0" catenateWords="0"
>>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
>>>  
>>> 
>>> 
>>> search handler is:
>>>
>>>  >> default="true">
>>>  
>>>  *dismax*
>>>
>>>  100%
>>>  
>>> *isbn_issn_search*^1
>>>  
>>>  
>>> *isbn_issn_search*^10
>>>  
>>>  0
>>>  0.1
>>>  ...
>>>  
>>>
>>> This is what I get:
>>>
>>> *1) 978-90-04-23560-1**
>>> *path=/select params={start=0&q=*978\-90\-
>>> 04\-23560\-1*&sfield=&qt=any_*
>>> *bc&wt=javabin&rows=10&**version=**2} *hits=1* status=0 QTime=5*
>>>
>>> 2) ***9789004235601*
>>> *webapp=/solr path=/select params={start=0&q=***
>>> 9789004235601*&sfield=&qt=any_bc&wt=javabin&rows=10&**version=**2}
>>>
>>> *hits=1* status=0 QTime=5*
>>>
>>> 3) **978 90 04 23560 1**
>>> *path=/select params={start=0&*q=978\+90\+
>>> 04\+23560\+1*&sfield=&qt=any_*
>>> *bc&wt=javabin&rows=10&**version=**2} *hits=0 *status=0 QTime=2*
>>>
>>>
>>> *Extract from queryDebug=true:
>>>
>>> 978\ 90\ 04\ 23560\ 1
>>> ...
>>> 978\ 90\ 04\ 23560\ 1
>>> 978\ 90\ 04\ 23560\ 1
>>> ...
>>> 
>>>  +((DisjunctionMaxQuery((isbn_issn_search:*978*^1.0)~0.**
>>> **1)
>>>  DisjunctionMaxQuery((isbn_issn_search:*90*^1.0)~0

Re: custom names for replicas in solrcloud

2013-08-26 Thread smanad
Is coreNodeName exposed via collections api?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Default query operator "OR" wont work in some cases

2013-08-26 Thread smanad
Hi, 

I have some documents with keywords "egg" and some with "salad" and some
with "egg salad".
When I search for egg salad, I expect to see egg results + salad results. I
dont see them. 
egg and salad queries individually work fine. 
I am using whitespacetokenizer.

Not sure if I am missing something.
Thanks, 
-Manasi 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different Responses for 4.4 and 3.5 solr index

2013-08-26 Thread Stefan Matheis
Did you check the scoring? (use fl=*,score to retrieve it) .. additionally 
debugQuery=true might provide more information about how the score was 
calculated.

- Stefan 


On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:

> Hi,
> The response from 4.4 and 3.5 in the current scenario differs in the
> sequence in which results are given us back.
> 
> For example :
> 
> Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
> Response from 4.4 solr is : id C, id:A, id:D, id:B...
> 
> Looking forward your reply.
> 
> Thanks.
> Kuchekar, Nilesh
> 
> 
> On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
> mailto:matheis.ste...@gmail.com)>wrote:
> 
> > Kuchekar (hope that's your first name?)
> > 
> > you didn't tell us .. how they differ? do you get an actual error? or does
> > the result contain documents you didn't expect? or the other way round,
> > that some are missing you'd expect to be there?
> > 
> > - Stefan
> > 
> > 
> > On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:
> > 
> > > Hi,
> > > 
> > > We get different response when we query 4.4 and 3.5 solr using same
> > > query params.
> > > 
> > > My query param are as following :
> > > 
> > > facet=true
> > > &facet.mincount=1
> > > &facet.limit=25
> > > 
> > 
> > &qf=content^0.0+p_last_name^500.0+p_first_name^50.0+strong_topic^0.0+first_author_topic^0.0+last_author_topic^0.0+title_topic^0.0
> > > &wt=javabin
> > > &version=2
> > > &rows=10
> > > &f.affiliation_org.facet.limit=150
> > > &fl=p_id,p_first_name,p_last_name
> > > &start=0
> > > &q=Apple
> > > &facet.field=affiliation_org
> > > &fq=table:profile
> > > &fq=num_content:[*+TO+1500]
> > > &fq=name:"Apple"
> > > 
> > > The content in both (solr 4.4 and solr 3.5) are same.
> > > 
> > > The solrconfig.xml from 3.5 an 4.4 are similarly constructed.
> > > 
> > > Is there something I am missing that might have been changed in 4.4,
> > which
> > > might be causing this issue. ?. The "qf" params looks same.
> > > 
> > > Looking forward for your reply.
> > > 
> > > Thanks.
> > > Kuchekar, Nilesh
> > > 
> > 
> > 
> 
> 
> 




Re: Adding one core to an existing core?

2013-08-26 Thread Bruno Mannina

Dear Solr User,

now I have 2 cores "collection1" "collection2"

Default collection is the "Collection1"

I have two questions:

- Is exist a parameter to add in my html link to indicate the selected 
core?

http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

I mean by default is the collection1, if I want "collection2" I use the 
link:

http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on

Is exist a param &core=collection2 instead of using a different link?


- My second question concerns updating.
Actually with one core, I do:
java -jar post.jar foo.xml

I suppose now I must add the desire core ? no ?
i.e.: -Dcore=collection2

What is the param to add in my command line?

Thanks a lot !

Bruno





Le 22/08/2013 16:23, Andrea Gazzarini a écrit :
First, a core is a separate index so it is completely indipendent from 
the already existing core(s). So basically you don't need to reindex.


In order to have two cores (but the same applies for n cores): you 
must have in your solr.home the file (solr.xml) described here


http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29

then, you must obviously have one or two directories (corresponding to 
the "instanceDir" attribute). I said one or two because if the indexes 
configuration is basically the same (or something changes but is 
dynamically configured - i.e. core name) you can create two instances 
starting from the same configuration. I mean



 
  
  
 


Otherwise you must have two different conf directories that contain 
indexes configuration. You should already have a first one (the 
current core), you just need to have another conf dir with 
solrconfig.xml, schema.xml and other required files. In this case each 
core will have its own instanceDir.



 
  
  
 


Best,
Andrea



On 08/22/2013 04:04 PM, Bruno Mannina wrote:

Little precision, I'm on Ubuntu 12.04LTS

Le 22/08/2013 15:56, Bruno Mannina a écrit :

Dear Users,

(Solr3.6 + Tomcat7)

I use since two years Solr with one core, I would like now to add 
one another core (a new database).


Can I do this without re-indexing my core1 ?
could you point me to a good tutorial to do that?

(my current database is around 200Go for 86 000 000 docs)
My new database will be little, around 1000 documents of 5ko each.

thanks a lot,
Bruno












adding support for deleteInstanceDir from solrj

2013-08-26 Thread Lyuba Romanchuk
Hi all,

Did anyone have a chance to look at the code?

It's attached here: https://issues.apache.org/jira/browse/SOLR-5023.



Thank you very much.

Lyuba


Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini

Hi Erick,
sorry I forgot the SOLR version...is the 3.6.0

ClientUtils in that version does whitespace escaping:

  public static String escapeQueryChars(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
  char c = s.charAt(i);
  // These characters are part of the query syntax and must be escaped
  if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || 
c == ')' || c == ':'
|| c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || 
c == '}' || c == '~'

|| c == '*' || c == '?' || c == '|' || c == '&'  || c == ';'
|| Character.isWhitespace(c)) {
sb.append('\\');
  }
  sb.append(c);
}
return sb.toString();
  }

Now, I solved the issue but not really sure about that.

Debugging the code I saw that the query string (on the SearchHandler)

978\ 90\ 04\ 23560\ 1

once passed through DismaxQueryParser (specifically through 
SolrPluginUtils.partialEscape(CharSequence)

becames

978\\ 90\\ 04\\ 23560\\ 1

because that method escapes the backslashes

So, using the eclipse debugger I removed at runtime the additional backslash 
and it works perfectly but of course...I can't do that in production for every 
search :D

So, just to try I changed dismax in edismax which, I saw, doesn't call 
SolrPluginUtilsand it works perfectly!

I saw in your query string that you used edismax too...maybe is that the point?

Many thanks
Andrea

On 08/26/2013 02:47 PM, Erick Erickson wrote:

Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars

That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/select?wt=json&q=ab\ cd\
ef&debug=query&defType=edismax&qf=name eoe
produced this output..

- parsedquery_toString: "+(eoe:abcdef | (name:ab name:cd name:ef))",


where the field "eoe" is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:


Hi Erick,
escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is
*978-90-04-23560-1*
- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.**
getParameter("q")));/

isbn is declared in this way


 
 

 
 
 


search handler is:

 
 
 *dismax*

 100%
 
*isbn_issn_search*^1
 
 
*isbn_issn_search*^10
 
 0
 0.1
 ...
 

This is what I get:

*1) 978-90-04-23560-1**
*path=/select params={start=0&q=*978\-90\-**04\-23560\-1*&sfield=&qt=any_*
*bc&wt=javabin&rows=10&version=**2} *hits=1* status=0 QTime=5*

2) ***9789004235601*
*webapp=/solr path=/select params={start=0&q=***
9789004235601*&sfield=&qt=any_**bc&wt=javabin&rows=10&version=**2}
*hits=1* status=0 QTime=5*

3) **978 90 04 23560 1**
*path=/select params={start=0&*q=978\+90\+**04\+23560\+1*&sfield=&qt=any_*
*bc&wt=javabin&rows=10&version=**2} *hits=0 *status=0 QTime=2*

*Extract from queryDebug=true:

978\ 90\ 04\ 23560\ 1
...
978\ 90\ 04\ 23560\ 1
978\ 90\ 04\ 23560\ 1
...

 +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1)
 DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1)
 DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5)
 DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^**
10.0)~0.1)


--**--
Probably this is a very stupid question but I'm going crazy. In this page

http://wiki.apache.org/solr/**DisMaxQParserPlugin

*Query Structure*

/For each "word" in the query string, dismax builds a DisjunctionMaxQuery
object for that word across all of the fields in the //qf//param...

/And seems exactly what it is doing...but what is a "word"? How can I
force//(without using double quotes) spaces in a way that they are
considered part of the word/?

/Many many many thanks
Andrea


On 08/13/2013 04:18 PM, Erick Erickson wrote:


I think you can get what you want by escaping the space with a
backslash

YMMV of course.
Erick


On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

  Hi Erick,

sorry if that wasn't clear: this is what I'm actually observing in my
application.

I wrote the first post after looking at the explain (debugQuery=true):
the
query

q=mag 778 G 69

is 

Re: Tokenization at query time

2013-08-26 Thread Erick Erickson
Andrea:

Works for me, admittedly through the browser

I suspect the problem is here: ClientUtils.**escapeQueryChars

That doesn't do anything about escaping the spaces, it just handles
characters that have special meaning to the query syntax, things like +-
etc.

Using your field definition, this:
http://localhost:8983/solr/select?wt=json&q=ab\ cd\
ef&debug=query&defType=edismax&qf=name eoe
produced this output..

   - parsedquery_toString: "+(eoe:abcdef | (name:ab name:cd name:ef))",


where the field "eoe" is your isbn_issn type.

Best,
Erick


On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

> Hi Erick,
> escaping spaces doesn't work...
>
> Briefly,
>
> - In a document I have an ISBN field that (stored value) is
> *978-90-04-23560-1*
> - In the index I have this value: *9789004235601*
>
> Now, I want be able to search the document by using:
>
> 1) q=*978-90-04-23560-1*
> 2) q=*978 90 04 23560 1*
> 3) q=*9789004235601*
>
> 1 and 3 works perfectly, 2 doesn't work.
>
> My code is:
>
> /SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.**
> getParameter("q")));/
>
> isbn is declared in this way
>
>  positionIncrementGap="100">
> 
> 
>
> 
>  generateWordParts="0" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>
> 
> 
> 
> search handler is:
>
>  default="true">
> 
> *dismax*
>
> 100%
> 
> *isbn_issn_search*^1
> 
> 
> *isbn_issn_search*^10
> 
> 0
> 0.1
> ...
> 
>
> This is what I get:
>
> *1) 978-90-04-23560-1**
> *path=/select params={start=0&q=*978\-90\-**04\-23560\-1*&sfield=&qt=any_*
> *bc&wt=javabin&rows=10&version=**2} *hits=1* status=0 QTime=5*
>
> 2) ***9789004235601*
> *webapp=/solr path=/select params={start=0&q=***
> 9789004235601*&sfield=&qt=any_**bc&wt=javabin&rows=10&version=**2}
> *hits=1* status=0 QTime=5*
>
> 3) **978 90 04 23560 1**
> *path=/select params={start=0&*q=978\+90\+**04\+23560\+1*&sfield=&qt=any_*
> *bc&wt=javabin&rows=10&version=**2} *hits=0 *status=0 QTime=2*
>
> *Extract from queryDebug=true:
>
> 978\ 90\ 04\ 23560\ 1
> ...
> 978\ 90\ 04\ 23560\ 1
> 978\ 90\ 04\ 23560\ 1
> ...
> 
> +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1)
> DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1)
> DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1)
> DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1)
> DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5)
> DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^**
> 10.0)~0.1)
> 
>
> --**--
> Probably this is a very stupid question but I'm going crazy. In this page
>
> http://wiki.apache.org/solr/**DisMaxQParserPlugin
>
> *Query Structure*
>
> /For each "word" in the query string, dismax builds a DisjunctionMaxQuery
> object for that word across all of the fields in the //qf//param...
>
> /And seems exactly what it is doing...but what is a "word"? How can I
> force//(without using double quotes) spaces in a way that they are
> considered part of the word/?
>
> /Many many many thanks
> Andrea
>
>
> On 08/13/2013 04:18 PM, Erick Erickson wrote:
>
>> I think you can get what you want by escaping the space with a
>> backslash
>>
>> YMMV of course.
>> Erick
>>
>>
>> On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini <
>> andrea.gazzar...@gmail.com> wrote:
>>
>>  Hi Erick,
>>> sorry if that wasn't clear: this is what I'm actually observing in my
>>> application.
>>>
>>> I wrote the first post after looking at the explain (debugQuery=true):
>>> the
>>> query
>>>
>>> q=mag 778 G 69
>>>
>>> is translated as follow:
>>>
>>>
>>> /  +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1)
>>>DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1)
>>>DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1)
>>>DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4)
>>>DisjunctionMaxQuery((//myfield://*mag778g69*//^3.**
>>> **0)~0.1)/
>>>
>>> It seems that althouhg I declare myfield with this type
>>>
>>> /
>>>
>>>  
>>>  
>>>
>>>  
>>>  >> generateWordParts="0" generateNumberParts="0"
>>>  catenateWords="0" catenateNumbers="0" catenateAll="1"
>>> splitOnCaseChange="0"
>>>
>>> />
>>>  
>>> 
>>>
>>> /SOLR is tokenizing it therefore by producing several tokens
>>> (mag,778,g,69)/
>>> /
>>>
>>> And I can't put double quotes on the query (q="mag 778 G 69") because the
>>> request handler searches also in other fields (with different
>>> configuration
>>> chains)
>>>
>>> As I understood the query parser, (i.e. query time), does a whitespace
>>> tokenization on its own before invoking my (query-time) chain. The 

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread skorrapa
I have also re indexed the data and tried. And also tried with the belowl
  
  


  



  



  

This didnt work as well...



On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] <
ml-node+s472066n4086601...@n3.nabble.com> wrote:

> Hello All,
>
> I am still facing the same issue. Case insensitive search isnot working on
> Solr 4.3
> I am using the below configurations in schema.xml
>  sortMissingLast="true" omitNorms="true">
>   
> 
> 
>   
> 
> 
> 
>   
> 
> 
> 
>   
> 
> Basically I want my string which could have spaces or characters like '-'
> or \ to be searched upon case insensitively.
> Please help.
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
>  To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html
Sent from the Solr - User mailing list archive at Nabble.com.

ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher

2013-08-26 Thread zhaoxin
470665 [commitScheduler-14-thread-1] ERROR
org.apache.solr.update.CommitTracker  – auto commit
error...:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1522)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1634)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:574)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ClassCastException



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ERROR-org-apache-solr-update-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Err-tp4086576.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: "Caused by: java.net.SocketException: Connection reset by peer: socket write error" solr querying

2013-08-26 Thread aniljayanti
Hi Greg,

thanks for reply,

I tried to set the maxIdleTime to 30 milliSeconds. But still getting
same error.

WARN  - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response;
Committed before 500 {msg=Connection reset by peer: socket write
error,trace=org.eclipse.jetty.io.EofException
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914)
at
org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170)
at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.util.FastWriter.flush(FastWriter.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Connection reset by peer: socket write
error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at 
org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164)
at 
org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:182)
at 
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:841)
... 37 more
,code=500}
WARN  - 2013-08-26 09:44:29.060; org.eclipse.jetty.servlet.ServletHandler;
/solr/324/select
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144)
at org.eclipse.jetty.server.Response.sendError(Response.java:314)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.ser

Re: Solr 4.2.1 update to 4.3/4.4 problem

2013-08-26 Thread skorrapa
Hello All,

I am still facing the same issue. Case insensitive search isnot working on
Solr 4.3
I am using the below configurations in schema.xml
  

  
  
  
  
  
  
   
  
  
  
  

Basically I want my string which could have spaces or characters like '-' or
\ to be searched upon case insensitively. 
Please help.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tokenization at query time

2013-08-26 Thread Andrea Gazzarini

Hi Erick,
escaping spaces doesn't work...

Briefly,

- In a document I have an ISBN field that (stored value) is 
*978-90-04-23560-1*

- In the index I have this value: *9789004235601*

Now, I want be able to search the document by using:

1) q=*978-90-04-23560-1*
2) q=*978 90 04 23560 1*
3) q=*9789004235601*

1 and 3 works perfectly, 2 doesn't work.

My code is:

/SolrQuery query = new 
SolrQuery(ClientUtils.escapeQueryChars(req.getParameter("q")));/


isbn is declared in this way

positionIncrementGap="100">




generateWordParts="0" generateNumberParts="0" catenateWords="0" 
catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"/>




search handler is:

default="true">


*dismax*
100%

*isbn_issn_search*^1


*isbn_issn_search*^10

0
0.1
...


This is what I get:

*1) 978-90-04-23560-1**
*path=/select 
params={start=0&q=*978\-90\-04\-23560\-1*&sfield=&qt=any_bc&wt=javabin&rows=10&version=2} 
*hits=1* status=0 QTime=5*


2) ***9789004235601*
*webapp=/solr path=/select 
params={start=0&q=*9789004235601*&sfield=&qt=any_bc&wt=javabin&rows=10&version=2} 
*hits=1* status=0 QTime=5*


3) **978 90 04 23560 1**
*path=/select 
params={start=0&*q=978\+90\+04\+23560\+1*&sfield=&qt=any_bc&wt=javabin&rows=10&version=2} 
*hits=0 *status=0 QTime=2*


*Extract from queryDebug=true:

978\ 90\ 04\ 23560\ 1
...
978\ 90\ 04\ 23560\ 1
978\ 90\ 04\ 23560\ 1
...

+((DisjunctionMaxQuery((isbn_issn_search:*978*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*90*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*04*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*23560*^1.0)~0.1)
DisjunctionMaxQuery((isbn_issn_search:*1*^1.0)~0.1))~5)
DisjunctionMaxQuery((isbn_issn_search:*9789004235601*^10.0)~0.1)



Probably this is a very stupid question but I'm going crazy. In this page

http://wiki.apache.org/solr/DisMaxQParserPlugin

*Query Structure*

/For each "word" in the query string, dismax builds a 
DisjunctionMaxQuery object for that word across all of the fields in the 
//qf//param...


/And seems exactly what it is doing...but what is a "word"? How can I 
force//(without using double quotes) spaces in a way that they are 
considered part of the word/?


/Many many many thanks
Andrea

On 08/13/2013 04:18 PM, Erick Erickson wrote:

I think you can get what you want by escaping the space with a backslash

YMMV of course.
Erick


On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:


Hi Erick,
sorry if that wasn't clear: this is what I'm actually observing in my
application.

I wrote the first post after looking at the explain (debugQuery=true): the
query

q=mag 778 G 69

is translated as follow:


/  +((DisjunctionMaxQuery((//**myfield://*mag*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*778*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*g*//^3000.0)~0.1)
   DisjunctionMaxQuery((//**myfield://*69*//^3000.0)~0.1))**~4)
   DisjunctionMaxQuery((//**myfield://*mag778g69*//^3.**0)~0.1)/

It seems that althouhg I declare myfield with this type

/

 
 

 
 
 


/SOLR is tokenizing it therefore by producing several tokens
(mag,778,g,69)/
/

And I can't put double quotes on the query (q="mag 778 G 69") because the
request handler searches also in other fields (with different configuration
chains)

As I understood the query parser, (i.e. query time), does a whitespace
tokenization on its own before invoking my (query-time) chain. The same
doesn't happen at index time...this is my problem...because at index time
the field is analyzed exactly as I want...but unfortunately cannot say the
same at query time.

Sorry for my wonderful english, did you get the point?


On 08/13/2013 02:18 PM, Erick Erickson wrote:


On a quick scan I don't see a problem here. Attach
&debug=query to your url and that'll show you the
parsed query, which will in turn show you what's been
pushed through the analysis chain you've defined.

You haven't stated whether you've tried this and it's
not working or you're looking for guidance as to how
to accomplish this so it's a little unclear how to
respond.

BTW, the admin/analysis page is your friend here

Best
Erick


On Mon, Aug 12, 2013 at 12:52 PM, Andrea Gazzarini <
andrea.gazzar...@gmail.com> wrote:

  Clear, thanks for response.

So, if I have two fields


  
  

  
  
  


  
  
  
  
  

  


(first field type *Mag. 78 D 99* becomes *mag78d99* while second field
type ends with several tokens)

And I want to use the same request handler to query against both of them.
I mean I want the user search something like

http///search?q=Mag 78 D 99

and this search should search with

Re: Dropping Caches of Machine That Solr Runs At

2013-08-26 Thread Furkan KAMACI
Hi Walter;

You are right about performance. However when I index documents on a
machine that has  a high percentage of Physical Memory usage I get EOF
errors?


2013/8/26 Walter Underwood 

> On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:
>
> > Sometimes Physical Memory usage of Solr is over %99 and this may cause
> > problems. Do you run such kind of a command periodically:
> >
> > sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
> >
> > to force dropping caches of machine that Solr runs at and avoid problems?
>
>
> This is a terrible idea. The OS automatically manages the file buffers.
> When they are all used, that is a  good thing, because it reduced disk IO.
>
> After this, no files will be cached in RAM. Every single read from a file
> will have to go to disk. This will cause very slow performance until the
> files are recached.
>
> Recently, I did exactly the opposite to improve performance in our Solr
> installation. Before starting the Solr process, a script reads every file
> in the index so that it will already be in file buffers. This avoids
> several minutes of high disk IO and slow performance after startup.
>
> wunder
> Search Guy, Chegg.com
>
>
>


SimpleFacet feature combinations..

2013-08-26 Thread Bram Van Dam

Hi folks,

Some of the features of SimpleFacet can't be combined -- the most 
notable missing combination being range + pivot. Another combination 
which we'd find very useful is integration with StatsComponent 
(pivot/ranged stats).


Is anyone working on this? Or willing to work on this? This is a rather 
important feature for us, one which we currently implement by launching 
N+1 queries (or worse). Given the importance, I would be willing and 
able to donate some of my time to work on this. However, not being very 
familiar with the solr internals, it would probably be easier to team up 
with someone else on this?


If anyone is interested, feel free to get in touch.

 - Bram


Re: custom names for replicas in solrcloud

2013-08-26 Thread YouPeng Yang
Hi  smanad

   If I do not make a mistake, You can append the coreNodeName parameter to
your creation command:


http://10.7.23.125:8080/solr/admin/cores?action=CREATE&name=dfscore8_3&shard=shard3_3&collection.configName=myconf&schema=schema.xml&config=solrconfig3.xml&collection=collection1&dataDir=/soData/&;
coreNodeName=heihei

   May it be helpful


Regards



2013/8/23 smanad 

> Hi,
>
> I am using Solr 4.3 with 3 solr hosta and with an external zookeeper
> ensemble of 3 servers. And just 1 shard currently.
>
> When I create collections using collections api it creates collections with
> names,
> collection1_shard1_replica1, collection1_shard1_replica2,
> collection1_shard1_replica3.
> Is there any way to pass a custom name? or can I have all the replicas with
> same name?
>
> Any pointers will be much appreciated.
> Thanks,
> -Manasi
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue

2013-08-26 Thread harshchawla
Thanks a lot in advance. I am eagerly waiting for your response.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086572.html
Sent from the Solr - User mailing list archive at Nabble.com.