Re: Peronalized Search Results or Matching Documents to Users

2015-07-31 Thread Umesh Prasad
:
  
* How often are documents assigned to new users?
* How many documents does a user typically have?
* Do you have a 'trigger' in your app that tells you a user has been
assigned
  a new doc?
  
   You can use a pseudo join to implement this sort of thing - have a
   different core that contains the 'permissions', either a document that
   says this document ID is accessible via these users or this user is
   allowed to see these document IDs. You are keeping your fast moving
   (authorization) data separate from your slow moving (the docs
   themselves) data.
  
   You can then say find me all documents that are accessible via user X
  
   Upayavira
  




-- 
Thanks  Regards
Umesh Prasad
Tech Lead @ flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Solr hangs / LRU operations are heavy on cpu

2015-03-22 Thread Umesh Prasad




-- 
Thanks  Regards
Umesh Prasad
Tech Lead @ flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Solr hangs / LRU operations are heavy on cpu

2015-03-19 Thread Umesh Prasad
It might be because LRUCache by default will try to evict its entries on
each call to put and putAll. LRUCache is built on top of java's
LinkedHashMap. Check the javadoc of removeEldestEntry
http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashMap.html#removeEldestEntry%28java.util.Map.Entry%29


Try using LFUCache and a separate cleanup thread .. We have been using that
for over 2 yrs now without any issues ..

For comparison of Cache in solr you can check this link
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig

On 20 March 2015 at 04:05, Sergey Shvets ser...@bintime.com wrote:

 LRUCache


It


-- 
Thanks  Regards
Umesh Prasad
Tech Lead @ flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Grouping based on multiple filters/criterias

2014-08-22 Thread Umesh Prasad
Solr does support date mathematics in filters / queries . So your
timestamps intervals can be dynamic ..




On 22 August 2014 05:51, deniz denizdurmu...@gmail.com wrote:

 umeshprasad wrote
  Grouping supports group by queries.
 
  https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
  However you will need to form the group queries before hand.
 
  Thanks  Regards
  Umesh Prasad
  Search

  Lead@

 
   in.linkedin.com/pub/umesh-prasad/6/5bb/580/

 have seen this page before but it is not providing the functionality that I
 need, because the timestamp interval would be seriously tricky, as it is
 supposed to be dynamic...

 though i have found another solution to handle this out of Solr :)



 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462p4154343.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Dynamically loaded core.properties file

2014-08-20 Thread Umesh Prasad
The core discovery process is dependent on presence of core.properties file
in the particular directory.

You can have a script, which will traverse the directory structure of core
base directory and depending on env/host name, will either restore
core.properties or rename it to a different file.

The script will have to run before solr starts. So solr will see the
directory structures, but core.properties will be missing from directories
which you do not want to load (renamed as core.properties.bkp)

We are already using this approach to control core discovery in prod (we
have 40 plus cores and we co-host only a couple of them on a single server.
)




On 21 August 2014 04:41, Erick Erickson erickerick...@gmail.com wrote:

 OK, not quite sure if this would work, but

 In each core.properties file, put in a line similar to what Chris
 suggested:
 properties=${env}/custom.properties

 You might be able to now define your sys var like
 -Drelative_or_absolute_path_to_dev_custom.proerties file.
 or
 -Drelative_or_absolute_path_to_prod_custom.proerties file.
 on Solr startup. Then in the custom.properties file you have whatever
 you need to define to make the prod/dev distinction you need.

 WARNING: I'm not entirely sure that relative pathing works here, which
 just means I haven't tried it.

 Best,
 Erick

 On Wed, Aug 20, 2014 at 3:11 PM, Ryan Josal ry...@pointinside.com wrote:
  Thanks Erick, that mirrors my thoughts exactly.  If core.properties had
  property expansion it would work for this, but I agree with not
 supporting
  that for the complexities it introduces, and I'm not sure it's the right
 way
  to solve it anyway.  So, it doesn't really handle my problem.
 
  I think because the properties file I want to load is not actually
 related
  to any core, it makes it easier to solve.  So if solr.xml is no longer
  rewritten then it seems like a global properties file could safely be
  specified there using property expansion.  Or maybe there is some way to
  write some code that could get executed before schema and solrconfig are
  parsed, although I'm not sure how that would work given how you need
  solrconfig to load the libraries and define plugins.
 
  Ryan
 
 
  On 08/20/2014 01:07 PM, Erick Erickson wrote:
 
  Hmmm, I was going to make a code change to do this, but Chris
  Hostetter saved me from the madness that ensues. Here's his comment on
  the JIRA that I did open (but then closed), does this handle your
  problem?
 
  I don't think we want to make the name of core.properties be variable
  ... that way leads to madness and confusion.
 
  the request on the user list was about being able to dynamically load
  a property file with diff values between dev  production like you
  could do in the old style solr.xml – that doesn't mean core.properties
  needs to have a configurable name, it just means there needs to be a
  configurable way to load properties.
 
  we already have a properties option which can be specified in
  core.properties to point to an additional external file that should
  also be loaded ... if variable substitution was in play when parsing
  core.properties then you could have something like
  properties=custom.${env}.properties in core.properties ... but
  introducing variable substitution into thecore.properties (which solr
  both reads  writes based on CoreAdmin calls) brings back the host of
  complexities involved when we had persistence of solr.xml as a
  feature, with the questions about persisting the original values with
  variables in them, vs the values after evaluating variables.
 
  Best,
  Erick
 
  On Wed, Aug 20, 2014 at 11:36 AM, Ryan Josal ry...@pointinside.com
  wrote:
 
  Hi all, I have a question about dynamically loading a core properties
  file
  with the new core discovery method of defining cores.  The concept is
  that I
  can have a dev.properties file and a prod.properties file, and specify
  which
  one to load with -Dsolr.env=dev.  This way I can have one file which
  specifies a bunch of runtime properties like external servers a plugin
  might
  use, etc.
 
  Previously I was able to do this in solr.xml because it can do system
  property substitution when defining which properties file to use for a
  core.
 
  Now I'm not sure how to do this with core discovery, since the core is
  discovered based on this file, and now the file needs to contain things
  that
  are specific to that core, like name, which previously were defined in
  the
  xml definition.
 
  Is there a way I can plugin some code that gets run before any schema
 or
  solrconfigs are parsed?  That way I could write a property loader that
  adds
  properties from ${solr.env}.properties to the JVM system properties.
 
  Thanks!
  Ryan
 
 




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: logging in solr

2014-08-20 Thread Umesh Prasad
Or you could use system properties to control that.

For example if you are using logbak, then

JAVA_OPTS=$JAVA_OPTS
-Dlogback.configurationFile=$CATALINA_BASE/conf/logback.xml will do it




On 20 August 2014 03:15, Aman Tandon amantandon...@gmail.com wrote:

 As you are using tomcat you can configure the log file name, folder,etc. by
 configuring the server.xml present in the Conf directory of tomcat.
 On Aug 19, 2014 4:17 AM, Shawn Heisey s...@elyograg.org wrote:

  On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote:
   Currently in my component Solr is logging to catalina.out. What
  is the configuration needed to redirect those logs to some custom logfile
  eg: Solr.log.
 
  Solr uses the slf4j library for logging.  Simply change your program to
  use slf4j, and very likely the logs will go to the same place the Solr
  logs do.
 
  http://www.slf4j.org/manual.html
 
  See also the wiki page on logging jars and Solr:
 
  http://wiki.apache.org/solr/SolrLogging
 
  Thanks,
  Shawn
 
 




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Substring and Case In sensitive Search

2014-08-20 Thread Umesh Prasad
The performance of wild card queries and specially prefix wild card query
can be quite slow.

http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/search/WildcardQuery.html

Also, you won't be able to time them out.

Take a look at ReversedWildcardFilter

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html

The blog post describes it nicely ..

http://solr.pl/en/2011/10/10/%E2%80%9Ccar-sale-application%E2%80%9D-%E2%80%93-solr-reversedwildcardfilter-%E2%80%93-lets-optimize-wildcard-queries-part-8/



On 19 August 2014 22:19, Jack Krupansky j...@basetechnology.com wrote:

 Substring search a string field using wildcard, *, at beginning and end
 of query term.

 Case-insensitive match on string field is not supported.

 Instead, copy the string field to a text field, use the keyword tokenizer,
 and then apply the lower case filter.

 But... review your use case to confirm whether you really need to use
 string as opposed to text field.

 -- Jack Krupansky

 -Original Message- From: Nishanth S
 Sent: Tuesday, August 19, 2014 12:03 PM
 To: solr-user@lucene.apache.org
 Subject: Substring and Case In sensitive Search


 Hi,

 I am  very new to solr.How can I allow solr search on a string field case
 insensitive and substring?.

 Thanks,
 Nishanth




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Grouping based on multiple filters/criterias

2014-08-20 Thread Umesh Prasad
Grouping supports group by queries.

https://cwiki.apache.org/confluence/display/solr/Result+Grouping

However you will need to form the group queries before hand.







On 18 August 2014 12:47, deniz denizdurmu...@gmail.com wrote:

 is it possible to have multiple filters/criterias on grouping? I am trying
 to
 do something like those, and I am assuming that from the statuses of the
 tickets, it doesnt seem possible?

 https://issues.apache.org/jira/browse/SOLR-2553
 https://issues.apache.org/jira/browse/SOLR-2526
 https://issues.apache.org/jira/browse/LUCENE-3257

 To make everything clear, here is details which I am planning to do with
 Solr...

 so there is an activity feed of a site and it is basically working like
 facebook or linkedin newsfeed, though there is no relationship between
 users, it doesnt matter if i am following someone or not, as long as their
 settings allows me to see their posts and they hit my search filter, i will
 see their posts.

 the part related with grouping is tricky... so lets assume that you are
 able
 to see my posts, and I have posted 8 activities in the last one hour, those
 activities should appear different than other posts, as it would be a
 combined view of the posts...

 i.e
  deniz
   activity one
   activity two
   .
   activity eight
  /deniz
  other user 1
  single activity
  /other user 1
  another user 1
  single activity
   /another user 1
   other user 2
  activity one
  activity two
   /other user 2

 So here the results should be grouped depending on their post times...

 on solr (4.7.2), i am indexing activities as documents, and each document
 has bunch of fields including timestamp and source_user etc etc.

 is it possible to do this on current solr?

 (in case the details are not clear, please feel free to ask for more
 details
 :) )







 -
 Zeki ama calismiyor... Calissa yapar...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Grouping-based-on-multiple-filters-criterias-tp4153462.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Selectively setting the number of returned SOLR rows per field based on field value

2014-08-20 Thread Umesh Prasad
Field Collapsing has a limitation. Currently it will not allow you to get
different number of results from a each group.

You can plug a custom AnalyticQuery, which can do exactly what you want
with after seeing a matching document.
https://cwiki.apache.org/confluence/display/solr/AnalyticsQuery+API




On 18 August 2014 04:32, Erick Erickson erickerick...@gmail.com wrote:

 Aurélien is correct, for the exact behavior you're looking
 for you'd need to run w queries.

 But you might be able to make do with field collapsing.
 You'd probably have to copyField from title to
 title_grouping which would be un-analyzed (string type
 or KeywordTokenizer), then group on _that_ field.
 You'd get back the top N matches grouped by title and
 your app could display that info however it made sense.

 Grouping sometimes goes by field collapsing FWIW.
 Erick

 On Sun, Aug 17, 2014 at 2:16 PM, talt mikaelsaltz...@gmail.com wrote:
  I have a field in my SOLR index, let's call it book_title.
 
  A query returns 15 rows with book_title:The Kite Runner, 13 rows with
  book_title:The Stranger, and 8 rows with book_title:The Ruby Way.
 
  Is there a way to return only the first row of The Kite Runner and The
  Stranger, but all of the The Ruby Way rows from the previous query
  result? This would result in 10 rows altogether. Is this possible at all,
  using a single query?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Selectively-setting-the-number-of-returned-SOLR-rows-per-field-based-on-field-value-tp4153441.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: indexing comments with Apache Solr

2014-08-06 Thread Umesh Prasad
 griddynamics blog  is useful. It has 4 parts which covers block join quite
well ..

http://blog.griddynamics.com/2012/08/block-join-query-performs.html
http://blog.griddynamics.com/2013/09/solr-block-join-support.html
http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html

The github repo is https://gist.github.com/mkhludnev


On 6 August 2014 19:05, Ali Nazemian alinazem...@gmail.com wrote:

 Dear Alexandre,
 Hi,
 Thank you very much. I think nested document is what I need. Do you have
 more information about how can I define such thing in solr schema? Your
 mentioned blog post was all about retrieving nested docs.
 Best regards.


 On Wed, Aug 6, 2014 at 5:16 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  You can index comments as child records. The structure of the Solr
  document should be able to incorporate both parents and children
  fields and you need to index them all together. Then, just search for
  JOIN syntax for nested documents. Also, latest Solr (4.9) has some
  extra functionality that allows you to find all parent pages and then
  expand children pages to match.
 
  E.g.: http://heliosearch.org/expand-block-join/ seems relevant
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On Wed, Aug 6, 2014 at 11:18 AM, Ali Nazemian alinazem...@gmail.com
  wrote:
   Dear Gora,
   I think you misunderstood my problem. Actually I used nutch for
 crawling
   websites and my problem is in index side and not crawl side. Suppose
 page
   is fetch and parsed by Nutch and all comments and the date and source
 of
   comments are identified by parsing. Now what can I do for indexing
 these
   comments? What is the document granularity?
   Best regards.
  
  
   On Wed, Aug 6, 2014 at 1:29 PM, Gora Mohanty g...@mimirtech.com
 wrote:
  
   On 6 August 2014 14:13, Ali Nazemian alinazem...@gmail.com wrote:
   
Dear all,
Hi,
I was wondering how can I mange to index comments in solr? suppose I
  am
going to index a web page that has a content of news and some
 comments
   that
are presented by people at the end of this page. How can I index
 these
comments in solr? consider the fact that I am going to do some
  analysis
   on
these comments. For example I want to have such query flexibility
 for
retrieving all comments that are presented between 24 June 2014 to
 24
   July
2014! or all the comments that are presented by specific person.
   Therefore
defining these comment as multi-value field would not be the
 solution
   since
in this case such query flexibility is not feasible. So what is you
suggestion about document granularity in this case? Can I consider
  all of
these comments as a new document inside main document (tree based
structure). What is your suggestion for this case? I think it is a
  common
case of indexing webpages these days so probably I am not the only
 one
thinking about this situation. Please share you though and perhaps
  your
experiences in this condition with me. Thank you very much.
  
   Parsing a web page, and breaking up parts up for indexing into
 different
   fields
   is out of the scope of Solr. You might want to look at Apache Nutch
  which
   can index into Solr, and/or other web crawlers/scrapers.
  
   Regards,
   Gora
  
  
  
  
   --
   A.Nazemian
 



 --
 A.Nazemian




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Modify/add/remove params at search component

2014-08-04 Thread Umesh Prasad
Use ModifiableParams

 SolrParams params = rb.req.getParams();
ModifiableSolrParams modifableSolrParams = new
ModifiableSolrParams(params);
modifableSolrParams.set(ParamName, paramValue);
rb.req.setParams(modifableSolrParams)




On 4 August 2014 12:47, Lee Chunki lck7...@coupang.com wrote:

 Hi,

 I am building a new search component and it runs after QueryComponent.
 What I want to do is set params  like start, rows, query and so on at new
 search component.

 I could set/get query by using
 setQueryString()

 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#setQueryString(java.lang.String)
 getQueryString()

 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/handler/component/ResponseBuilder.html#getQueryString()

 and get params by using
 rb.req.getParams()

 but how can I set params at search component?

 Thanks,
 Chunki.




-- 
Thanks  Regards
Umesh Prasad
Search l...@flipkart.com

 in.linkedin.com/pub/umesh-prasad/6/5bb/580/


Re: Query on Facet

2014-08-02 Thread Umesh Prasad
 using below query to get facets with combination of language
 and
   its
 binding. But now I'm getting only selected  facet in facetList of
  each
 field and its count.  For e.g.  in language facets the query is
   returning
 English and its count. Instead I need to get other language
 facets
which
 satisfies binding type of paperback




   
  
 
 http://localhost:8080/solr/collection1/select?q=software%20testingfq=language%3A(%22English%22)fq=Binding%3A(%22paperback%22)facet=truefacet.mincount=1


   
  
 
 facet.field=Languagefacet.field=latestArrivalsfacet.field=Bindingwt=jsonindent=truedefType=edismax
 json.nl=map



 Please provide me your inputs.


 Thanks  Regards,

 Smitha
   
  
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Solr gives the same fieldnorm for two different-size fields

2014-08-02 Thread Umesh Prasad
What you really need is a covering type  match. I feel your use case fits
into this type

Score (Exact match in order)Score ( Exact match without order ) 
Score (Non Exact Match)

Example  Query : a b c

Example docs :
  d1 :  a b c
  d2 :  a c b
  d3 :  c a b
  d4 : a b c d
  d5 : a b c d e

Use case 1 : Only exact match is a match. (So only d1 is a match)
Use case 2 : Only in order are matches. So d2, d3 aren't matches. Scores
are d1  d4  d5
Use case 3 : Only in order are matches. And only one extra term is allowed.
So d2, d3, d5  aren't matches. Scores are d1  d4
Use case 4 : All are matches and d1  d2  d3  d4  d5

All of these use cases can be satisfied by using SpanQueries, which tracks
the positions at which terms matches. For covering match, you will need to
introduce add start and end sentinel terms during indexing.

There is an excellent post by Mark Miller about span queries
http://searchhub.org/2009/07/18/the-spanquery/
 Solr's SurroundQuery Parser allows you to create SpanQueries
http://wiki.apache.org/solr/SurroundQueryParser
Or you can plug your own query parser into solr to do the same.

Some more links you can get here ..
http://search-lucene.com/?q=span+queriesfc_project=Lucenefc_project=Solr



On 1 August 2014 00:24, Erick Erickson erickerick...@gmail.com wrote:

 You can consider, say, a copyField directive and copy the field into a
 string type (or perhaps keyworTokenizer followed by lowerCaseFilter) and
 then match or boost on an exact match rather than trying to make scoring
 fill this role.

 In any case, I'm thinking of normalizing the sensitive fields and indexing
 them as a single token (i.e. the string type or keywordtokenizer) to
 disambiguate these cases.

 Because otherwise I fear you'll get one situation to work, then fail on the
 next case. In your example, you're trying to use length normalization to
 influence scoring to get the doc with the shorter field to sort above the
 doc with the longer field. But what are you going to do when your target is
 university of california berkley research? Rely on matching all the
 terms? And so on...

 Best,
 Erick


 On Thu, Jul 31, 2014 at 10:26 AM, gorjida a...@sciencescape.net wrote:

  Thanks so much for your reply... In my case, it really matters because I
 am
  going to find the correct institution match for an affiliation string...
  For
  example, if an author belongs to the university of Toronto, his/her
  affiliation should be normalized against the solr... In this case,
  University of California Berkley Research is a different place to
  university of california berkeley... I see top-matches are tied in the
  score for this specific example... I can break the tie using other
  techniques... However, I am keen to see if this is a common problem in
  solr?
 
  Regards,
 
  Ali
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Searching words with spaces for word without spaces in solr

2014-08-02 Thread Umesh Prasad
.
 While
using
  shingle in query analyzer, the query ice cube creates three
  tokens
   as
  ice,cube, icecube. Only ice and cubes are searched but not
  icecubes.i.e not working for pair though I am using shingle
  filter.
 
  Here's the schema config.
 
 
 1.  fieldType name=text class=solr.TextField
 positionIncrementGap=100
 2.   analyzer type=index
 3. filter class=solr.SynonymFilterFactory
 synonyms=synonyms_text_prime_index.txt ignoreCase=true
 expand=true/
 4. charFilter
 class=solr.HTMLStripCharFilterFactory/
 5. tokenizer class=solr.StandardTokenizerFactory/
 6.  filter class=solr.ShingleFilterFactory
 maxShingleSize=2 outputUnigrams=true tokenSeparator=/
 7.  filter class=solr.WordDelimiterFilterFactory
 catenateWords=1 catenateNumbers=1 catenateAll=1
 preserveOriginal=1
 generateWordParts=1 generateNumberParts=1/
 8. filter class=solr.LowerCaseFilterFactory/
 9. filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
 10.   /analyzer
 11.   analyzer type=query
 12. tokenizer class=solr.StandardTokenizerFactory/
 13. filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
 14. filter class=solr.ShingleFilterFactory
 maxShingleSize=2 outputUnigrams=true tokenSeparator=/
 15. filter class=solr.WordDelimiterFilterFactory
 preserveOriginal=1/
 16. filter class=solr.LowerCaseFilterFactory/
 17. filter class=solr.SnowballPorterFilterFactory
 language=English protected=protwords.txt/
 18.   /analyzer
 19. /fieldType
 
 Any help is appreciated.
 
 

   
  
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Bloom filter

2014-08-02 Thread Umesh Prasad
+1 to Guava's BloomFilter implementation.

You can actually hook into UpdateProcessor chain and have the logic of
updating bloom filter / checking there.

We had a somewhat similar use case.  We were using DIH and it was possible
that same solr input document (meaning same content) will be coming lots of
times and it was leading to a lot of unnecessary updates in index. I
introduced a DuplicateDetector using update processor chain which kept a
map of Unique ID -- solr doc hash code and will drop the document if it
was a duplicate.

There is a nice video of other usage of Update chain

https://www.youtube.com/watch?v=qoq2QEPHefo






On 30 July 2014 23:05, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

 You're right. I misunderstood. I thought that you wanted to optimize the
 finding by id path which is typically done for comparing versions during
 inserts in Solr.

 Yes, it won't help with the case where the ID does not exist.


 On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen st...@designware.dk
 wrote:

  Hi
 
  I am not sure exactly what LUCENE-5675 does, but reading the description
  it seems to me that it would help finding out that there is no document
  (having an id-field) where version-field is less than some-version. As
  far as I can see this will not help finding out if a document with
  id=some-id exists. We want to ask does a document with id some-id
  exist, without knowing the value of its version-field (if it actually
  exists). You do not know if it ever existed, either.
 
  Please elaborate. Thanks!
 
  Regarding  The only other choice today is bloom filters, which use up
  huge amounts of memory, I guess a bloom filter only takes as much space
  (disk or memory) as you want it to. The more space you allow it to use
 the
  more it gives you a false positive (saying this doc might exist in
 cases
  where the doc actually does not exist). So the space you need to use for
  the bloom filter depends on how frequently you can live with false
  positives (where you have to actually look it up in the real index).
 
  Regards, Per Steffensen
 
 
  On 30/07/14 10:05, Shalin Shekhar Mangar wrote:
 
  Hi Per,
 
  There's LUCENE-5675 which has added a new postings format for IDs.
 Trying
  it out in Solr is in my todo list but maybe you can get to it before me.
 
  https://issues.apache.org/jira/browse/LUCENE-5675
 
 
  On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen st...@designware.dk
  wrote:
 
   On 30/07/14 08:55, jim ferenczi wrote:
 
   Hi Per,
  First of all the BloomFilter implementation in Lucene is not exactly a
  bloom filter. It uses only one hash function and you cannot set the
  false
  positive ratio beforehand. ElasticSearch has its own bloom filter
  implementation (using guava like BloomFilter), you should take a
 look
  at
  their implementation if you really need this feature.
 
   Yes, I am looking into what Lucene can do and how to use it through
  Solr.
  If it does not fit our needs I will enhance it - potentially with
  inspiration from ES implementation. Thanks
 
What is your use-case ? If your index fits in RAM the bloom filter
  won't
 
  help (and it may have a negative impact if you have a lot of
 segments).
  In
  fact the only use case where the bloom filter can help is when your
 term
  dictionary does not fit in RAM which is rarely the case.
 
   We have so many documents that it will never fit in memory. We use
  optimistic locking (our own implementation) to do correct concurrent
  assembly of documents and to do duplicate control. This require a lot
 of
  finding docs from their id, and most of the time the document is not
  there,
  but to be sure we need to check both transactionlog and the actual
 index
  (UpdateLog). We would like to use Bloom Filter to quickly tell that a
  document with a particular id is NOT present.
 
   Regards,
  Jim
 
   Regards, Per Steffensen
 
 
 
 
 


 --
 Regards,
 Shalin Shekhar Mangar.




-- 
---
Thanks  Regards
Umesh Prasad


Re: Shuffle results a little

2014-08-02 Thread Umesh Prasad
What you are look for is a distribution of search results. One way would be
a two phase search
Phase 1 : Search (with rows =0, No scoring, no grouping)
1. Find the groups (unique combinations) using pivot facets  (won't work in
distributed env yet)
2. Transform those groups as group.queries ..

Phase 2 : Actual search ( with group.queries )

Pros : Readily available and well tested.
Cons :  It will give you exact same number of results for each group, which
may not be desired. Specifically with pagination. And of course, you are
making two searches.

2nd Approach would be to have this logic of distributing along different
dimensions as your own custom component. Solr's PostFilter/delegating
collector can be used for same. Basically TopDocCollector just maintains a
PriorityQueue for matching documents. You can plugin your own collector, so
that it sees all matching documents. Identifies which groups they belong to
(if groups/pivots have been already identified) , maintains the priority
queue for each of them and then finally merges them. Quite a bit of
customization if you ask me, but can be done and it would be most powerful.

PS : We use the 2nd approach.





On 30 July 2014 05:56, babenis babe...@gmail.com wrote:

 despite the fact that I upgrade to 4.9.0 - grouping doesn't seem to work on
 multi valued field, ie

 i was going to try to group by tags + brand (where tags is a multi-valued
 field) and spread results apart or select unique combinations only





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Shuffle-results-a-little-tp1891206p4149973.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
---
Thanks  Regards
Umesh Prasad


Re: To warm the whole cache of Solr other than the only autowarmcount

2014-08-02 Thread Umesh Prasad
@Eric : As you said, each use-case is different. We actually autowarm our
caches to 80% and we have a 99% hit ratio on filter cache. For query cache,
hit ratios are like 25% but given that cache hit saves us about 10X, we
strive to increase cache hit ratio.

@Yang : You can't do a direct copy of values. Values are related to
lucene's internal document id and they can change during an index update.
The change can happen because of document being deleted, segments being
merged or new segments being created. Solr's caches refer to global doc id
which are even more prone to change (because of index merges).



On 28 July 2014 21:32, Erick Erickson erickerick...@gmail.com wrote:

 bq: autowarmcount=1024...

 That's the point, this is quite a high number in my
 experience.

 I've rarely seen numbers above 128 show much of
 any improvement. I've seen a large number of
 installations use much smaller autowarm numbers,
 as in the 16-32 range and be quite content.

 I _really_ recommend you try to use much smaller
 numbers then _measure_ whether the first few
 queries after a commit show unacceptable
 response times before trying to make things
 better. This really feels like premature
 optimization.

 Of course you know your problem space better than
 I do, it's just that I've spent too much of my
 professional life fixing the wrong problem; I've
 become something of a measure first curmudgeon.

 FWIW,
 Erick


 On Sun, Jul 27, 2014 at 10:48 PM, YouPeng Yang yypvsxf19870...@gmail.com
 wrote:

  Hi Erick
 
  We do the DIH job from the DB and committed frequently.It takes a long
 time
  to autowarm the filterCaches after commit or soft commit  happened when
  setting the autowarmcount=1024,which I do think is small enough.
  So It comes up an idea that whether it  could  directly pass the
 reference
  of the caches   over to the new caches so that the autowarm processing
 will
  take much fewer time .
 
 
 
  2014-07-28 2:30 GMT+08:00 Erick Erickson erickerick...@gmail.com:
 
   Why do you think you _need_ to autowarm the entire cache? It
   is, after all, an LRU cache, the theory being that the most recent
   queries are most likely to be reused.
  
   Personally I'd run some tests on using small autowarm counts
   before getting at all mixed up in some complex scheme that
   may not be useful at all. Say an autowarm count of 16. Then
   measure using that, then say 32 then... Insure you have a real
   problem before worrying about a solution! ;)
  
   Best,
   Erick
  
  
   On Fri, Jul 25, 2014 at 6:45 AM, Shawn Heisey s...@elyograg.org
 wrote:
  
On 7/24/2014 8:45 PM, YouPeng Yang wrote:
 To Matt

   Thank you,your opinion is very valuable ,So I have checked the
  source
 codes about how the cache warming  up. It seems to just put items
 of
   the
 old caches into the new caches.
   I will pull Mark Miller into this discussion.He is the one of the
 developer of the Solr whom  I had  contacted with.

  To Mark Miller

Would you please check out what we are discussing in the last
 two
 posts.I need your help.
   
Matt is completely right.  Any commit can drastically change the
 Lucene
document id numbers.  It would be too expensive to determine which
numbers haven't changed.  That means Solr must throw away all cache
information on commit.
   
Two of Solr's caches support autowarming.  Those caches use queries
 as
keys and results as values.  Autowarming works by re-executing the
 top
  N
queries (keys) in the old cache to obtain fresh Lucene document id
numbers (values).  The cache code does take *keys* from the old cache
for the new cache, but not *values*.  I'm very sure about this, as I
wrote the current (and not terribly good) LFUCache.
   
Thanks,
Shawn
   
   
  
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Implementing custom analyzer for multi-language stemming

2014-08-02 Thread Umesh Prasad
Also, take a look at the Lucid revolution talk Typed Index
https://www.youtube.com/watch?v=X93DaRfi790

 *Published on 25 Nov 2013*

Presented by Christoph Goller, Chief Scientist, IntraFind Software AG

If you want to search in a multilingual environment with high-quality
language-specific word-normalization, if you want to handle mixed-language
documents, if you want to add phonetic search for names if you need a
semantic search which distinguishes between a search for the color brown
and a person with the second name brown, in all these cases you have to
deal with different types of terms. I will show why it makes much more
sense to attach types (prefixes) to Lucene terms instead of relying on
different fields or even different indexes for different kinds of terms.
Furthermore I will show how queries to such a typed index look and why e.g.
SpanQueries are needed to correctly treat compound words and phrases or
realize a reasonable phonetic search. The Analyzers and the QueryParser
described are available as plugins for Lucene, Solr, and elasticsearch.




On 31 July 2014 00:34, Sujit Pal sujit@comcast.net wrote:

 Hi Eugene,

 In a system we built couple of years ago, we had a corpus of English and
 French mixed (and Spanish on the way but that was implemented by client
 after we handed off). We had different fields for each language. So (title,
 body) for English docs was (title_en, body_en), for French (title_fr,
 body_fr) and for Spanish (title_es, body_es) - each of these were
 associated with a different Analyzer (that was associated with the field
 types in schema.xml, in case of Lucene you can use
 PerFieldAnalyzerWrapper). Our pipeline used Google translate to detect the
 language and write the contents into the appropriate field set for the
 language. Our analyzers were custom - but Lucene/Solr provides analyzer
 chains for many major languages. You can find a list here:

 https://wiki.apache.org/solr/LanguageAnalysis

 -sujit



 On Wed, Jul 30, 2014 at 10:52 AM, Chris Morley ch...@depahelix.com
 wrote:

  I know BasisTech.com has a plugin for elasticsearch that extends
  stemming/lemmatization to work across 40 natural languages.
  I'm not sure what they have for Solr, but I think something like that may
  exist as well.
 
  Cheers,
  -Chris.
 
  
   From: Eugene beyondcomp...@gmail.com
  Sent: Wednesday, July 30, 2014 1:48 PM
  To: solr-user@lucene.apache.org
  Subject: Implementing custom analyzer for multi-language stemming
 
  Hello, fellow Solr and Lucene users and developers!
 
  In our project we receive text from users in different languages. We
  detect language automatically and use Google Translate APIs a lot (so
  having arbitrary number of languages in our system doesn't concern us).
  However we need to be able to search using stemming. Having nearly
 hundred
  of fields (several fields for each language with language-specific
  stemmers) listed in our search query is not an option. So we need a way
 to
  have a single index which has stemmed tokens for different languages. I
  have two questions:
 
  1. Are there already (third-party) custom multi-language stemming
  analyzers? (I doubt that no one else ran into this issue)
 
  2. If I'm going to implement such analyzer myself, could you please
  suggest a better way to 'pass' detected language value into such
 analyzer?
  Detecting language in analyzer itself is not an option, because: a) we
  already detect it in other place b) we do it based on combined values of
  many fields ('name', 'topic', 'description', etc.), while current field
  can
  be to short for reliable detection c) sometimes we just want to specify
  language explicitly. The obvious hack would be to prepend ISO 639-1 code
  to
  field value. But I'd like to believe that Solr allows for cleaner
  solution.
  I could think about either: a) custom query parameter (but I guess, it
  will
  require modifying request handlers, etc. which is highly undesirable) b)
  getting value from other field (we obviously have 'language' field and we
  do not have mixed-language records). If it is possible, could you please
  describe the mechanism for doing this or point to relevant code examples?
  Thank you very much and have a good day!
 
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Identify specific document insert error inside a solrj batch request

2014-08-02 Thread Umesh Prasad
)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:370)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:960)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1021)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:957)

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Wednesday, July 30, 2014 5:53 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Identify specific document insert error inside a solrj batch
 request

 Agreed that this is a problem with Solr. If it was merely bad input,
 Solr should be returning a 4xx error.

 I don't know if we already have a Jira for this. If not, one should be
 filed.

 There are two issues:

 1. The status code should be 4xx with an appropriate message about bad
 input.

 2. The offset of the offending document should be reported so that the app
 can locate the problem to resolve it.

 Give us the actual server stack trace so we can verify whether this was
 simply user error or some defect in Solr itself.

 -- Jack Krupansky

 -Original Message-
 From: Liram Vardi
 Sent: Wednesday, July 30, 2014 9:25 AM
 To: solr-user@lucene.apache.org
 Subject: Identify specific document insert error inside a solrj batch
 request

 Hi All,

 I have a question regarding the use of HttpSolrServer (SolrJ).
 I have a collection of SolrInputDocuments I want to send to Solr as a
 batch.
 Now, let's assume that one of the docs inside this collection is corrupted
 (missing some required field).
 When I send the batch of docs to solr using HttpSolrServer.add(Collection
 SolrInputDocument docs) I am getting the following general exception:

 org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
 Server at http://172.23.3.91:8210/solr/template returned non ok
 status:500,
 message:Server Error

 When I check Solr log, I can identify exactly which is the corrupted
 document.

 My question:
 Is it possible to identify the problematic document at the client side?
 (for
 recovery purposes)

 Thanks,
 Liram


 Email secured by Check Point




-- 
---
Thanks  Regards
Umesh Prasad


Re: Mixing ordinary and nested documents

2014-07-22 Thread Umesh Prasad
Query parentFilterQuery = new TermQuery(new Term(document_type,
parent));

int[] childToParentDocMapping = new int[searcher.maxDoc()];
DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
DocIterator iter = allParentDocSet.iterator();
int child = 0;
while (iter.hasNext()) {
int parent = iter.nextDoc();
while (child = parent) {
childToParentDocMapping[child] = parent;
child++;
}
}


On 22 July 2014 16:28, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk
wrote:

 Thanks, Umesh

 You can get the parent bitset by running a the parent doc type query on
  the solr indexsearcher.
  Then child bitset by runnning the child doc type query. Then  use these
  together to create a int[] where int[i] = parent of i.
 

 Can you kindly add an example? I am not quite sure how to put this into a
 query?

 I can easily make the join from child to parent, but what I want to achieve
 is to get the parent document added to the result if it exists but maintain
 the scoring fromt the child as well as the full child document. Is this
 possible?

 Cheers,
 Bjørn

 2014-07-18 19:00 GMT+02:00 Umesh Prasad umesh.i...@gmail.com:

  Comments inline
 
 
  On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk
  wrote:
 
   Hi Solr users
  
   I would appreciate your inputs on how to handle a *mix *of *simple *and
   *nested
   *documents in the most easy and flexible way.
  
   I need to handle:
  
  - simple documens: webpages, short articles etc. (approx. 90% of the
  content)
  - nested documents: books containing chapters etc. (approx 10% of
 the
  content)
  
  
 
 
   For simple documents I just want to present straightforward search
  results
   without any grouping etc.
  
   For the nested documents I want to group by book and show book title,
  book
   price etc. AND the individual results within the book. Lets say there
 is
  a
   hit on Chapters 1 and Chapter 7 within Book 1 and a hit on
 Article
   1, I would like to present this:
  
   *Book 1 title*
   Book 1 published date
   Book 1 description
   - *Chapter 1 title*
 Chapter 1 snippet
   - *Chapter 7 title*
 CHapter 7 snippet
  
   *Article 1 title*
   Article 1 published date
   Article 1 description
   Article 1 snippet
  
   It looks like it is pretty straightforward to use the CollapsingQParser
  to
   collapse the book results into one result and not to collapse the other
   results. But how about showing the information about the book (the
 parent
   document of the chapters)?
  
 
  You can map the child document to parent  doc id space and extract the
  information from parent doc id.
 
  First you need to generate child doc to parent doc id mapping one time.
You can get the parent bitset by running a the parent doc type query on
  the solr indexsearcher.
  Then child bitset by runnning the child doc type query. Then  use these
  together to create a int[] where int[i] = parent of i. This result is
  cachable till next commit. I am doing that for computing facets from
 fields
  in parent docs and sorting on values from parent docs (while getting
 child
  docs as output).
 
 
 
 
   1) Is there a way to do an* optional block join* to a *parent *document
  and
   return it together *with *the *child *document - but not to require a
   parent document?
  
   - or -
  
   2) Do I need to require parent-child documents for everything? This is
   really not my preferred strategy as only a small part of the documents
 is
   in a real parent-child relationship. This would mean a lot of dummy
 child
   documents.
  
  
 
  
   - or -
  
   3) Should I just denormalize data and include the book information
 within
   each chapter document?
  
   - or -
  
   4) ... or is there a smarter way?
  
   Your help is very much appreciated.
  
   Cheers,
  
   Bjørn Axelsen
  
 
 
 
  --
  ---
  Thanks  Regards
  Umesh Prasad
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Mixing ordinary and nested documents

2014-07-22 Thread Umesh Prasad
public static DocSet mapChildDocsToParentOnly(DocSet childDocSet) {

DocSet mappedParentDocSet = new BitDocSet();
DocIterator childIterator = childDocSet.iterator();
while (childIterator.hasNext()) {
int childDoc = childIterator.nextDoc();
int parentDoc = childToParentDocMapping[childDoc];
mappedParentDocSet.addUnique(parentDoc);
}
int[] matches = new int[mappedParentDocSet.size()];
DocIterator parentIter = mappedParentDocSet.iterator();
for (int i = 0; parentIter.hasNext(); i++) {
matches[i] = parentIter.nextDoc();
}
return new SortedIntDocSet(matches); // you will need
SortedIntDocSet impl else docset interaction in some facet queries fails
later.
}



On 22 July 2014 19:59, Umesh Prasad umesh.i...@gmail.com wrote:

 Query parentFilterQuery = new TermQuery(new Term(document_type,
 parent));

 int[] childToParentDocMapping = new int[searcher.maxDoc()];
 DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
 DocIterator iter = allParentDocSet.iterator();
 int child = 0;
 while (iter.hasNext()) {
 int parent = iter.nextDoc();
 while (child = parent) {
 childToParentDocMapping[child] = parent;
 child++;
 }
 }


 On 22 July 2014 16:28, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk
 wrote:

 Thanks, Umesh

 You can get the parent bitset by running a the parent doc type query on
  the solr indexsearcher.
  Then child bitset by runnning the child doc type query. Then  use these
  together to create a int[] where int[i] = parent of i.
 

 Can you kindly add an example? I am not quite sure how to put this into a
 query?

 I can easily make the join from child to parent, but what I want to
 achieve
 is to get the parent document added to the result if it exists but
 maintain
 the scoring fromt the child as well as the full child document. Is this
 possible?

 Cheers,
 Bjørn

 2014-07-18 19:00 GMT+02:00 Umesh Prasad umesh.i...@gmail.com:

  Comments inline
 
 
  On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk
 
  wrote:
 
   Hi Solr users
  
   I would appreciate your inputs on how to handle a *mix *of *simple
 *and
   *nested
   *documents in the most easy and flexible way.
  
   I need to handle:
  
  - simple documens: webpages, short articles etc. (approx. 90% of
 the
  content)
  - nested documents: books containing chapters etc. (approx 10% of
 the
  content)
  
  
 
 
   For simple documents I just want to present straightforward search
  results
   without any grouping etc.
  
   For the nested documents I want to group by book and show book title,
  book
   price etc. AND the individual results within the book. Lets say there
 is
  a
   hit on Chapters 1 and Chapter 7 within Book 1 and a hit on
 Article
   1, I would like to present this:
  
   *Book 1 title*
   Book 1 published date
   Book 1 description
   - *Chapter 1 title*
 Chapter 1 snippet
   - *Chapter 7 title*
 CHapter 7 snippet
  
   *Article 1 title*
   Article 1 published date
   Article 1 description
   Article 1 snippet
  
   It looks like it is pretty straightforward to use the
 CollapsingQParser
  to
   collapse the book results into one result and not to collapse the
 other
   results. But how about showing the information about the book (the
 parent
   document of the chapters)?
  
 
  You can map the child document to parent  doc id space and extract the
  information from parent doc id.
 
  First you need to generate child doc to parent doc id mapping one time.
You can get the parent bitset by running a the parent doc type query
 on
  the solr indexsearcher.
  Then child bitset by runnning the child doc type query. Then  use these
  together to create a int[] where int[i] = parent of i. This result is
  cachable till next commit. I am doing that for computing facets from
 fields
  in parent docs and sorting on values from parent docs (while getting
 child
  docs as output).
 
 
 
 
   1) Is there a way to do an* optional block join* to a *parent
 *document
  and
   return it together *with *the *child *document - but not to require a
   parent document?
  
   - or -
  
   2) Do I need to require parent-child documents for everything? This is
   really not my preferred strategy as only a small part of the
 documents is
   in a real parent-child relationship. This would mean a lot of dummy
 child
   documents.
  
  
 
  
   - or -
  
   3) Should I just denormalize data and include the book information
 within
   each chapter document?
  
   - or -
  
   4) ... or is there a smarter way?
  
   Your help is very much appreciated.
  
   Cheers,
  
   Bjørn Axelsen
  
 
 
 
  --
  ---
  Thanks  Regards
  Umesh Prasad
 




 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks

Re: Match query string within indexed field?

2014-07-19 Thread Umesh Prasad
Please ignore my earlier answer .. I had missed that you wanted a match
spotting .. So that all the indexed terms must be present in the query ...

One way, I can think of is SpanQueries .. But it won't be efficient and
won't scale to multiple fields ..

My suggestion would be to  keep the mapping of keyword -- field name,
count  mapping in some key value store
and use it at query time to find field name for  query terms ..












On 19 July 2014 02:34, prashantc88 prashant.chau...@searshc.com wrote:

 Hi,

 Thanks for the reply. Is there a better way to do it if the scenario is the
 following:

 Indexed values: abc def

 Query String:xy abc def z

 So basically the query string has to match all the words present in the
 indexed data to give a MATCH.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Match-indexed-data-within-query-string-tp4147896p4147958.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
---
Thanks  Regards
Umesh Prasad


Re: Match query string within indexed field?

2014-07-19 Thread Umesh Prasad
*Span Queries for illustration :*
During Analysis : Inject startSentinel and endSentinal  in your indexed
field ..
So after analysis your field will look like ...
   start abc def endl
Now during query time, you can expand your query clause programmatic create
queries which will look like
 (start xyz end) OR  ( start abc end ) OR   basically all
unigrams
  (start xyz abc end ) OR (start abc def end ) OR ... bigrams
and so on ...

Then for each of your clauses, you will need to generate a SpanQuery ...
Flexible Query parser can help you here .. You will need to plug a custom
query builder there ..

However, as you can see, ngrams  based queries will results into a lot of
clauses  n^2 .. exactly for just one field .. And if you are searching
across multiple fields then it will go to m * n ^ 2..


On 20 July 2014 10:31, Umesh Prasad umesh.i...@gmail.com wrote:

 Please ignore my earlier answer .. I had missed that you wanted a match
 spotting .. So that all the indexed terms must be present in the query ...

 One way, I can think of is SpanQueries .. But it won't be efficient and
 won't scale to multiple fields ..

 My suggestion would be to  keep the mapping of keyword -- field name,
 count  mapping in some key value store
 and use it at query time to find field name for  query terms ..












 On 19 July 2014 02:34, prashantc88 prashant.chau...@searshc.com wrote:

 Hi,

 Thanks for the reply. Is there a better way to do it if the scenario is
 the
 following:

 Indexed values: abc def

 Query String:xy abc def z

 So basically the query string has to match all the words present in the
 indexed data to give a MATCH.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Match-indexed-data-within-query-string-tp4147896p4147958.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: solr boosting any perticular URL

2014-07-18 Thread Umesh Prasad
Or you can give huge boosts to url  at query time. If you are using dismax
then you can use bq
like bq = myfield:url1 ^ 50 .. That will bring up url1 as the first
result always.



On 18 July 2014 15:27, benjelloun anass@gmail.com wrote:

 hello,

 before index the URL to a field in Solr, you can use java api(Solrj) and do
 a test
 if(URL==)
 index on  field1
 else
 index on field2


 then use edismax to boost a specific field:
 requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
 str name=defTypeedismax/str
str name=qf
field1^5.0 field2^1.0
/str
 /requestHandler




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-boosting-any-perticular-URL-tp4147657p4147864.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
---
Thanks  Regards
Umesh Prasad


Re: solr boosting any perticular URL

2014-07-18 Thread Umesh Prasad
PS : You can give huge boosts to url at query time on a per request basis.
Don't specify the bqs on solrconfig.xml .. Always determine add bqs for the
query at run time..


On 18 July 2014 15:49, Umesh Prasad umesh.i...@gmail.com wrote:

 Or you can give huge boosts to url  at query time. If you are using dismax
 then you can use bq
 like bq = myfield:url1 ^ 50 .. That will bring up url1 as the
 first result always.



 On 18 July 2014 15:27, benjelloun anass@gmail.com wrote:

 hello,

 before index the URL to a field in Solr, you can use java api(Solrj) and
 do
 a test
 if(URL==)
 index on  field1
 else
 index on field2


 then use edismax to boost a specific field:
 requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
 str name=defTypeedismax/str
str name=qf
field1^5.0 field2^1.0
/str
 /requestHandler




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-boosting-any-perticular-URL-tp4147657p4147864.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: Match query string within indexed field?

2014-07-18 Thread Umesh Prasad
You are looking for wildcard queries but they can be quite costly and you
will need to benchmark performance ..
Specially Suffix wild card queries  (of type *abc) are quite costly ..

You can convert a suffix query into a prefix query by using a
ReverseTokenFilter during index time analysis.

A search on older mails will be useful ..
http://search-lucene.com/?q=wild+card+performance

Uwe's mail explains why performance optimization of Suffix wild card
queries is difficult ..
http://search-lucene.com/m/w1CAyxDpbC1/wild+card+performancesubj=Wild+Card+Query+Performance





On 18 July 2014 20:38, prashantc88 prashant.chau...@searshc.com wrote:

 Hi,

 My requirement is to give a match whenever a string is found within the
 indexed data of a field irrespective of where it is found.

 For example, if I have a field which is indexed with the data abc. Now
 any
 of the following query string must give a match: xyzabc,xyabc, abcxyz ..

 I am using *solr.KeywordTokenizerFactory* as the tokenizer class and
 *solr.LowerCaseFilterFactory* filter as index time in *schema.xml*.

 Could anyone help me out as to how I can achieve the functionality.

 Thanks in advance.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Match-query-string-within-indexed-field-tp4147896.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
---
Thanks  Regards
Umesh Prasad


Re: Mixing ordinary and nested documents

2014-07-18 Thread Umesh Prasad
Comments inline


On 16 July 2014 20:31, Bjørn Axelsen bjorn.axel...@fagkommunikation.dk
wrote:

 Hi Solr users

 I would appreciate your inputs on how to handle a *mix *of *simple *and
 *nested
 *documents in the most easy and flexible way.

 I need to handle:

- simple documens: webpages, short articles etc. (approx. 90% of the
content)
- nested documents: books containing chapters etc. (approx 10% of the
content)




 For simple documents I just want to present straightforward search results
 without any grouping etc.

 For the nested documents I want to group by book and show book title, book
 price etc. AND the individual results within the book. Lets say there is a
 hit on Chapters 1 and Chapter 7 within Book 1 and a hit on Article
 1, I would like to present this:

 *Book 1 title*
 Book 1 published date
 Book 1 description
 - *Chapter 1 title*
   Chapter 1 snippet
 - *Chapter 7 title*
   CHapter 7 snippet

 *Article 1 title*
 Article 1 published date
 Article 1 description
 Article 1 snippet

 It looks like it is pretty straightforward to use the CollapsingQParser to
 collapse the book results into one result and not to collapse the other
 results. But how about showing the information about the book (the parent
 document of the chapters)?


You can map the child document to parent  doc id space and extract the
information from parent doc id.

First you need to generate child doc to parent doc id mapping one time.
  You can get the parent bitset by running a the parent doc type query on
the solr indexsearcher.
Then child bitset by runnning the child doc type query. Then  use these
together to create a int[] where int[i] = parent of i. This result is
cachable till next commit. I am doing that for computing facets from fields
in parent docs and sorting on values from parent docs (while getting child
docs as output).




 1) Is there a way to do an* optional block join* to a *parent *document and
 return it together *with *the *child *document - but not to require a
 parent document?

 - or -

 2) Do I need to require parent-child documents for everything? This is
 really not my preferred strategy as only a small part of the documents is
 in a real parent-child relationship. This would mean a lot of dummy child
 documents.




 - or -

 3) Should I just denormalize data and include the book information within
 each chapter document?

 - or -

 4) ... or is there a smarter way?

 Your help is very much appreciated.

 Cheers,

 Bjørn Axelsen




-- 
---
Thanks  Regards
Umesh Prasad


Re: How do I get faceting to work with Solr JOINs

2014-07-17 Thread Umesh Prasad
Hi Vinay,

You can customize the FacetsComponent. Basically FacetComponent uses
SimpleFacets to compute the facet count. It passes matched docset present
in responsebuilder to  SimpleFacets's constructor.

1.  Build a mapping between parent space and auxiliary document space in
(say an int array) and cache it in your own custom cache in
SolrIndexSearcher.  You will need to rebuild this mapping on every commit
have to define a CacheRegenerator for that.

2.  You can map the matched docset (which is in parent space) to auxiliary
document space.
 The catch is that facets from non matching auxililary docs also would
be counted.

3. You can then pass on this mapped auxiliary document to SimpleFacets for
faceting.

I have doing something similar for our needs .. Basically, we have a parent
document with text attributes and changes very less. And we have child
documents with inventory attributes which changes extremely fast. The
search results requires child documents but faceting has to be done on text
attributes which belong to parents. So we do this mapping by customizing
the FacetComponent.






On 18 July 2014 04:11, Vinay B, vybe3...@gmail.com wrote:

 Some Background info :
 In our application, we have a requirement to update large number of records
 often.  I investigated solr child documents but it requires updating both
 the child and the parent document . Therefore, I'm investigating adding
 frequently updated information in an auxillary document with a custom
 defined parent-id field that can be used to join with the static parent
 document. - basically rolling my own child document functionality.

 This approach has satisfied all my requirements, except one. How can I
 facet upon a field present in the auxillary document?

 First, here's a gist dump of my test core index (4 docs + 4 aux docs)
 https://gist.github.com/anonymous/2774b54e667778c71492

 Next, here's a simple facet query only on the aux . While this works, it
 only returns auxillary documents
 https://gist.github.com/anonymous/a58b87576b895e467c68

 Finally, I tweak the query using a SOLR join (
 https://wiki.apache.org/solr/Join ) to return the main documents (which it
 does), but the faceting returns no results. This is what I'm hoping someone
 on this list can answer .
 Here is the gist of that query
 https://gist.github.com/anonymous/f3a287ab726f35b142cf

 Any answers, suggestions ?

 Thanks




-- 
---
Thanks  Regards
Umesh Prasad


Re: Memory leak for debugQuery?

2014-07-17 Thread Umesh Prasad
Histogram by itself isn't sufficient to root cause the JVM heap issue.
We have found JVM heap memory  issues multiple times in our system and each
time it was due to a different reasons.  I would recommend taking  heap
dumps at regular interval (using jmap/visual vm) and analyze those heap
dumps. That will give a definite answer to memory issues.

 I have regularly analyzed heap dump of size  32 GB with eclipse memory
analyzer. The linux version comes with a command line script
ParseHeapDump.sh inside mat directory.

# Usage: ParseHeapDump.sh path/to/dump.hprof [report]*
#
# The leak report has the id org.eclipse.mat.api:suspects
# The top component report has the id org.eclipse.mat.api:top_components
Increase the memory by setting Xmx and Xms param in MemoryAnalyzer.ini (in
same directory).

The leak suspect report is quite good. For checking detailed allocation
pattern etc , you can copy the index files generated from parsing and open
it in GUI.




On 17 July 2014 05:36, Tomás Fernández Löbbe tomasflo...@gmail.com wrote:

 Also, is this trunk? Solr 4.x? Single shard, right?


 On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  Tom -
 
  You could maybe isolate it a little further by seeing using the “debug
  parameter with values of timing|query|results
 
  Erik
 
  On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:
 
   Hello all,
  
   I'm trying to get relevance scoring information for each of 1,000 docs
  returned for each of 250 queries.If I run the query (appended below)
  without debugQuery=on, I have no problem with getting all the results
 with
  under 4GB of memory use.  If I add the parameter debugQuery=on, memory
 use
  goes up continuously and after about 20 queries (with 1,000 results
 each),
  memory use reaches about 29.1 GB and the garbage collector gives up:
  
org.apache.solr.common.SolrException;
 null:java.lang.RuntimeException:
  java.lang.OutOfMemoryError: GC overhead limit exceeded
  
   I've attached a jmap -histo, exgerpt below.
  
   Is this a known issue with debugQuery?
  
   Tom
   
   query:
  
  
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
  
   without debugQuery=on:
  
  
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
  
   num   #instances#bytes  Class description
  
 
 --
   1:  585,559 10,292,067,456  byte[]
   2:  743,639 18,874,349,592  char[]
   3:  53,821  91,936,328  long[]
   4:  70,430  69,234,400  int[]
   5:  51,348  27,111,744
   org.apache.lucene.util.fst.FST$Arc[]
   6:  286,357 20,617,704
   org.apache.lucene.util.fst.FST$Arc
   7:  715,364 17,168,736  java.lang.String
   8:  79,561  12,547,792  * ConstMethodKlass
   9:  18,909  11,404,696  short[]
   10: 345,854 11,067,328  java.util.HashMap$Entry
   11: 8,823   10,351,024  * ConstantPoolKlass
   12: 79,561  10,193,328  * MethodKlass
   13: 228,587 9,143,480
  org.apache.lucene.document.FieldType
   14: 228,584 9,143,360
 org.apache.lucene.document.Field
   15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
   16: 210,342 8,413,680   java.util.TreeMap$Entry
   17: 81,576  8,204,648   java.util.HashMap$Entry[]
   18: 107,921 7,770,312
  org.apache.lucene.util.fst.FST$Arc
   19: 13,020  6,874,560
  org.apache.lucene.util.fst.FST$Arc[]
  
   debugQuery_jmap.txt
 
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: SOLR Performance benchmarking

2014-07-13 Thread Umesh Prasad
Hi Rashi,
Also, checkout
http://searchhub.org/2010/01/21/the-seven-deadly-sins-of-solr/ ..
It would help if you can share your solrconfig.xml and schema.xml .. Some
problems are evident from there itself.

 From our experience we have found
1. JVM Heap size (check for young gen size and new/old ratio. Default is
very low for Prod setups)
2. Solr cache tuning as Siegfried pointed out. There are 4 cache
queryCache, filterCache , documentCache and FieldValueCache. Make sure that
you have the caches populated to by defining a newSearcher and
autowarmCount is properly configured.
3. About long running queries, solr core logs are your friend, analyze the
QTime percentiles.
The list of reasons here is big. Two that we have found killers for
performance are
 a) . A query time analyzer chain of synonym filter --  stemmer --
synonym filter had resulted in like 50 * 50 = 2500 terms for a single term
for us
 b) ngroups and groups.truncate are quite costly, specially if you have
large cardinality for field. And these aren't cached.
 c) Faceting/filtering on timestamp fields (with arbitrary accuracy)
 d) Deep paging




On 13 July 2014 14:48, Siegfried Goeschl sgoes...@gmx.at wrote:

 Hi Rashi,

 abnormal behaviour depends on your data, system and work load - I have
 seen abnormal behaviour at customers sites and it turned out to be a
 miracle that they the customer had no serious problems before :-)

 * running out of sockets - you might need to check if you have enough
 sockets (system quota) and that the sockets are closed properly (mostly a
 Windows/networking issue - CLOSED_WAIT)
 * understand your test setup - usually a test box is much smaller in terms
 of CPU/memory than you production box
 ** you might be forced to tweak your test configuration (e.g. production
 SOLR cache configuration can overwhelm a small server)
 * understand your work-load
 ** if you have long-running queries within your performance tests they
 tend to bring down your server under high-load and your “abnormal”
 condition looks very normal at hindsight
 ** spot your long-running queries, optimise them, re-run your tests
 ** check your cache warming and how fast you start your load injector
 threads

 Cheers,

 Siegfried Goeschl


 On 13 Jul 2014, at 09:53, rashi gandhi gandhirash...@gmail.com wrote:

  Hi,
 
  I am using SolrMeter for load/stress testing solr performance.
  Tomcat is configured with default maxThreads (i.e. 200).
 
  I set Intended Request per min in SolrMeter to 1500 and performed
 testing.
 
  I found that sometimes it works with this much load on solr but sometimes
  it gives error Sever Refused Connection in solr.
  On getting this error, i increased maxThreads to some higher value, and
  then it works again.
 
  I would like to know why solr is behaving abnormally, initially when it
 was
  working with maxThreads=200.
 
  Please provide me some pointers.




-- 
---
Thanks  Regards
Umesh Prasad


Re: Group only top 50 results not All results.

2014-07-13 Thread Umesh Prasad
Another way is to extend the existing Facets component.   FacetsComponent
uses SimpleFacets to compute facets where it passes the matching docset
(rb.getResults.docSet) as an argument in constructor. Instead you can pass
it the ranked docList  by passing (rb.getResults.docList).

Basically 3 steps
1. Develop your custom facet component.
For reference you can look at source cod of FacetsComponent.
https://github.com/apache/lucene-solr/blob/d49f297a4c7ab2c518717fa5a6ceeeda222349c3/solr/core/src/java/org/apache/solr/handler/component/FacetComponent.java
(line 79 - 82)

2.  Register the Extended FacetComponent as custom component in
solrconfig.xml
It will look something like

  searchComponent name=myfacet
class=com.flipkart.solr.handler.component.MyFacetComponent /

3. Call that as part of your custom request handler pipeline.
arr name=last-components
strmyfacet/str

You can check
http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html
for a sample.





On 13 July 2014 00:02, Joel Bernstein joels...@gmail.com wrote:

 I agree with Alex a PostFilter would work. But it would be a somewhat
 tricky PostFilter to write. You would need to collect the top 50 documents
 using a priority queue in the DelegatingCollector.collect() method. Then in
 the DelegatingCollector.finish() method you would send the top documents to
 the lower collectors. Grouping supports PostFilters so this should work
 with Grouping or you could use the CollapsingQParserPlugin.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Sat, Jul 12, 2014 at 1:31 PM, Alexandre Rafalovitch arafa...@gmail.com
 
 wrote:

  I don't think either grouping or faceting work as postfilter.
  Otherwise, that would be one way. Have a custom post-filter that only
  allows top 50 documents and have grouping run as an even-higher-cost
  postfilter after that.
 
  Regards,
 Alex.
  Personal: http://www.outerthoughts.com/ and @arafalov
  Solr resources: http://www.solr-start.com/ and @solrstart
  Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
 
 
  On Sat, Jul 12, 2014 at 11:49 PM, Erick Erickson
  erickerick...@gmail.com wrote:
   You could also return the top 50 groups. That will certainly contain
 the
  top
   50 responses. The app layer could then do some local sorting to figure
   out what was correct. Maybe you'd be returning 3 docs in each or
  something...
  
   I'd probably only go there if Michael's approach didn't work out
 though.
  
   On Fri, Jul 11, 2014 at 10:52 AM, Michael Ryan mr...@moreover.com
  wrote:
   I suggest doing this in two queries. In the first query, retrieve the
  unique ids of the top 50 documents. In the second query, just query for
  those ids (e.g., q=ids:(2 13 55 62 81)), and add the facet parameters on
  that query.
  
   -Michael
  
   -Original Message-
   From: Aaron Gibbons [mailto:agibb...@synergydatasystems.com]
   Sent: Friday, July 11, 2014 1:46 PM
   To: solr-user@lucene.apache.org
   Subject: Group only top 50 results not All results.
  
   I'm trying to figure out how I can query solr for the top X results
  THEN group and count only those top 50 by their owner.
  
   I can run a query to get the top 50 results that I want.
   solr/select?q=(current_position_title%3a(TEST))rows=50
  
   I've tried Faceting but I get all results faceted not just the top 50:
  
 
 solr/select?q=(current_position_title%3a(TEST))start=0rows=50facet=truefacet.field=recruiterkeyidfacet.limit=-1facet.mincount=1facet.sort=true
  
   I've tried Grouping and get all results again grouped not just the top
  50.
  
 
 solr/select?q=(current_position_title%3a(TEST))rows=50group=truegroup.field=recruiterkeyidgroup.limit=1group.format=groupedversion=2.2
  
   I could also run one search to get the top X record Id's then run a
  second Grouped query on those but I was hoping there was a less expensive
  way run the search.
  
   So what I need to get back are the distinct recruiterkeyid's from the
  top X query and the count of how many there are only in the top X
 results.
   I'll ultimately want to query the results for each of individual
  recruiterkeyid as well.  I'm using SolrNet to build the query.
  
   Thank you for your help,
   Aaron
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication

2014-07-13 Thread Umesh Prasad
Must Mention here. This Atomic Update will only work if you all your fields
are stored.  It eases out work on your part, but the stored fields will
bloat the index.





On 12 July 2014 22:06, Erick Erickson erickerick...@gmail.com wrote:

 bq: But does performance remain same in this situation

 No. Some documents will require two calls to be indexed. And you'll
 be sending one document at a time rather than batching them up.
 Of course it'll be slower. But will it still be fast enough? Only you can
 answer that.

 If it's _really_ a problem, you could consider using a custom update
 processor
 plugin that does all this on the server side. This would not require you to
 change Solr code, just write a relatively small bit of code and use the
 plugin infrastructure.

 Best,
 Erick

 On Thu, Jul 10, 2014 at 1:56 PM, Ali Nazemian alinazem...@gmail.com
 wrote:
  Thank you very much. Now I understand what was the idea. It is better
 than
  changing Solr. But does performance remain same in this situation?
 
 
  On Tue, Jul 8, 2014 at 10:43 PM, Chris Hostetter 
 hossman_luc...@fucit.org
  wrote:
 
 
  I think you are missunderstanding what Himanshu is suggesting to you.
 
  You don't need to make lots of big changes ot the internals of solr's
 code
  to get what you want -- instead you can leverage the Atomic Updates 
  Optimistic Concurrency features of Solr to get the existing internal
 Solr
  to reject any attempts to add a duplicate documentunless the client code
  sending the document specifies it should be an update.
 
  This means your client code needs to be a bit more sophisticated, but
 the
  benefit is that you don't have to try to make complex changes to the
  internals of Solr that may be impossible and/or difficult to
  support/upgrade later.
 
  More details...
 
 
 
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency
 
  Simplest possible idea based on the basic info you have given so far...
 
  1) send every doc using _version_=-1
  2a) if doc update fails with error 409, that means a version of this doc
  already exists
  2b) resend just the field changes (using set atomic
  operation) and specify _version_=1
 
 
 
  : Dear Himanshu,
  : Hi,
  : You misunderstood what I meant. I am not going to update some field.
 I am
  : going to change what Solr do on duplication of uniquekey field. I dont
  want
  : to solr overwrite Whole document I just want to overwrite some parts
 of
  : document. This situation does not come from user side this is what
 solr
  do
  : to documents with duplicated uniquekey.
  : Regards.
  :
  :
  : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra 
  : himanshu.mehro...@snapdeal.com wrote:
  :
  :  Please look at https://wiki.apache.org/solr/Atomic_Updates
  : 
  :  This does what you want just update relevant fields.
  : 
  :  Thanks,
  :  Himanshu
  : 
  : 
  :  On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com
 
  :  wrote:
  : 
  :   Dears,
  :   Hi,
  :   According to my requirement I need to change the default behavior
 of
  Solr
  :   for overwriting the whole document on unique-key duplication. I am
  going
  :  to
  :   change that the overwrite just part of document (some fields) and
  other
  :   parts of document (other fields) remain unchanged. First of all I
  need to
  :   know such changing in Solr behavior is possible? Second, I really
  :   appreciate if you can guide me through what class/classes should I
  :  consider
  :   for changing that?
  :   Best regards.
  :  
  :   --
  :   A.Nazemian
  :  
  : 
  :
  :
  :
  : --
  : A.Nazemian
  :
 
  -Hoss
  http://www.lucidworks.com/
 
 
 
 
  --
  A.Nazemian




-- 
---
Thanks  Regards
Umesh Prasad


Re: SOLR-6143 Bad facet counts from CollapsingQParserPlugin

2014-07-13 Thread Umesh Prasad
Hi Joel,
 Actually I also have seen this. The counts given by groups.truncate
and collapsingQParserPlugin differ.. We have a golden query framework for
our product APIs and there we have seen differences in facet count given.
One request uses groups.truncate and another collapsingQParser plugin and
we have seen counts differ (By a small margin)
I haven't been able to isolate the issue to a unit test level, so I
haven't raised a bug.




On 12 July 2014 08:57, Joel Bernstein joels...@gmail.com wrote:

 The CollapsingQParserPlugin currently supports facet counts that match
 group.truncate. This works great for some use cases.

 There are use cases though where group.facets counts are preferred. No
 timetable yet on adding this feature for the CollapsingQParserPlugin.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Thu, Jul 10, 2014 at 7:20 PM, shamik sham...@gmail.com wrote:

  Are there any plans to release this feature anytime soon ? I think this
 is
  pretty important as a lot of search use case are dependent on the facet
  count being returned by the search result. This issue renders renders the
  CollapsingQParserPlugin pretty much unusable. I'm now reverting back to
 the
  old group query (painfully slow) since I can't use the facet count
 anymore.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/RE-SOLR-6143-Bad-facet-counts-from-CollapsingQParserPlugin-tp4140455p4146645.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-07-02 Thread Umesh Prasad
Created the jira ..
https://issues.apache.org/jira/browse/SOLR-6222



On 30 June 2014 23:53, Joel Bernstein joels...@gmail.com wrote:

 Sure, go ahead create the ticket. I think there is more we can here as
 well. I suspect we can get the CollapsingQParserPlugin to work with
 useFilterForSortedQuery=true if scoring is not needed for the collapse.
 I'll take a closer look at this.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Jun 30, 2014 at 1:43 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Hi Joel,
  Thanks a lot for clarification ..  An error message would indeed be a
  good thing ..   Should I open a jira item for same ?
 
 
 
  On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote:
 
   OK, I see the problem. When you use useFilterForSortedQuery true
   /useFilterForSortedQuery Solr builds a docSet in a way that seems to
 be
   incompatible with the CollapsingQParserPlugin. With
   useFilterForSortedQuery
   true /useFilterForSortedQuery, Solr doesn't run the main query again
  when
   collecting the DocSet. The getDocSetScore() method is expecting the
 main
   query to present, because the CollapsingQParserPlugin may need the
 scores
   generated from the main query, to select the group head.
  
   I think trying to make useFilterForSortedQuery true
   /useFilterForSortedQuery compatible with CollapsingQParsePlugin is
   probably not possible. So, a nice error message would be a good thing.
  
   Joel Bernstein
   Search Engineer at Heliosearch
  
  
   On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com
   wrote:
  
Hi ,
Found another bug with CollapsignQParserPlugin. Not a critical
 one.
   
It throws an exception when used with
   
useFilterForSortedQuery true /useFilterForSortedQuery
   
Patch attached (against 4.8.1 but reproducible in other branches
 also)
   
   
518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
   
  
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29}
hits=2 status=0 QTime=99
4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
   
  
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=}
hits=4 status=0 QTime=15
4587 T11 C0 oasc.SolrException.log ERROR
java.lang.UnsupportedOperationException: Query  does not implement
createWeight
at org.apache.lucene.search.Query.createWeight(Query.java:80)
at
   
  
 
 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381)
at
   
  
 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
at
   
  
 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
at
   
  
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at
   
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at
 org.apache.solr.util.TestHarness.query(TestHarness.java:295)
at
 org.apache.solr.util.TestHarness.query(TestHarness.java:278)
at
   org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676)
at
   org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669)
at
   
  
 
 org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at
   
  
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
   
  
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at
   
  
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at
   
  
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53

Re: how to log ngroups

2014-06-30 Thread Umesh Prasad
Hi Aman,
You can implement and register a last-component which extracts the
ngroups from response and adds it to response.
You can checkout tutorial about SearchComponent here
http://sujitpal.blogspot.in/2011/04/custom-solr-search-components-2-dev.html
..





On 29 June 2014 20:31, Aman Tandon amantandon...@gmail.com wrote:

 Any help here?

 With Regards
 Aman Tandon


 On Thu, Jun 26, 2014 at 7:32 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  I am grouping in my results and also applying the group limit. Is there
 is
  any way to log the ngroups as well along with hits.
 




-- 
---
Thanks  Regards
Umesh Prasad


Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-06-29 Thread Umesh Prasad
Hi Joel,
Thanks a lot for clarification ..  An error message would indeed be a
good thing ..   Should I open a jira item for same ?



On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote:

 OK, I see the problem. When you use useFilterForSortedQuery true
 /useFilterForSortedQuery Solr builds a docSet in a way that seems to be
 incompatible with the CollapsingQParserPlugin. With
 useFilterForSortedQuery
 true /useFilterForSortedQuery, Solr doesn't run the main query again when
 collecting the DocSet. The getDocSetScore() method is expecting the main
 query to present, because the CollapsingQParserPlugin may need the scores
 generated from the main query, to select the group head.

 I think trying to make useFilterForSortedQuery true
 /useFilterForSortedQuery compatible with CollapsingQParsePlugin is
 probably not possible. So, a nice error message would be a good thing.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Hi ,
  Found another bug with CollapsignQParserPlugin. Not a critical one.
 
  It throws an exception when used with
 
  useFilterForSortedQuery true /useFilterForSortedQuery
 
  Patch attached (against 4.8.1 but reproducible in other branches also)
 
 
  518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29}
  hits=2 status=0 QTime=99
  4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null
 
 params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=}
  hits=4 status=0 QTime=15
  4587 T11 C0 oasc.SolrException.log ERROR
  java.lang.UnsupportedOperationException: Query  does not implement
  createWeight
  at org.apache.lucene.search.Query.createWeight(Query.java:80)
  at
 
 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
  at
  org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
  at
 
 org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879)
  at
 
 org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902)
  at
 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381)
  at
 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478)
  at
 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461)
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
  at org.apache.solr.util.TestHarness.query(TestHarness.java:295)
  at org.apache.solr.util.TestHarness.query(TestHarness.java:278)
  at
 org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676)
  at
 org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669)
  at
 
 org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
  at
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
  at
 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  at
 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  at
 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
  at
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  at
 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
  at
 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
  at
 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48

Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken

2014-06-24 Thread Umesh Prasad
Hi Joel,
   Had missed this email .. Some issue with my gmail setting.

The reason CollapsignQParserPlugin is more performant than regular grouping
is because

1.  QParser refers to global ords for group.field and avoids storing
strings in a set. This has two advantage.
  a) Terms of memory (storing millions of ints vs strings, results in major
savings).
  b)  No binary search / look up is necessary when segment changes.
Resulting in huge computation savings.

2. The cost
CollapsingFieldValue has to maintain score/field value for each unique
ord.
   Memory requirement = number of ords * size of 1 field value.
   The basic types byte, int, float , long etc will consume reasonable
memory.
String/Text value can be stored as ords and will consume only 4 bytes.

The memory requirement is because arrays are dense and it is per request.
Taking an example :
 Index Size = 100 million documents
 Unique ords =  10 million
 Sort field = 4   ( 1 int field + 1 long  field + 2 string/text field)
 Memory  requirement =  40 MB  for  int field  +  80 MB for long field
+ 80 MB for string ords  = 200 MB


I agree 200 MB per request just for collapsing the search results is huge
but at least it increases linearly with number of sort fields.. For my use
case, I am willing to pay the linear cost specially when I can't combine
the sort fields intelligently into a sort function. Plus it allows me to
sort by String/Text fields also which is a big win.

PS :
1. We can store long/string fields also as byte/short ords ..For sort
fields, where number of unique values are smaller ( example sort by date ,
sales rank etc), this will result into significant memory savings.








On 19 June 2014 19:40, Joel Bernstein joels...@gmail.com wrote:

 Umesh, this is a good summary.

 So, the question is what is the cost (performance and memory) of having the
 CollapsingQParserPlugin choose the group head by using the Solr sort
 criteria?

 Keep in mind that the CollapsingQParserPlugin's main design goal is to
 provide fast performance when collapsing on a high cardinality field. How
 you choose the group head can have a big impact here, both on memory
 consumption performance.

 The function query collapse criteria was added to allow you to come up with
 custom formulas for selecting the group head, with little or no impact on
 performance and memory. Using Solr's recip() function query it seems like
 you could come up with some nice scenarios where two variables could be
 used to select the group head. For example:

 fq={!collapse field=a max='sub(prod(cscore(),1000), recip(field(x),1, 1000,
 1000))'}

 This seems like it would basically give you two sort critea: cscore(),
 which returns the score, would be the primary criteria. The recip of field
 x would be the secondary criteria.













 Joel Bernstein
 Search Engineer at Heliosearch


 On Thu, Jun 19, 2014 at 2:18 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Continuing the discussion on mailing list from Jira.
 
  An Example
 
 
  *id  group   f1  f2*1   g1
  5   10
  2   g1 5   1000
  3   g1 5   1000
  4   g1 10  100
  5   g2 5   10
  6   g2 5   1000
  7   g2 5   1000
  8   g210  100
 
  sort= f1 asc, f2 desc , id desc
 
 
  *Without collapse will give : *
  (7,g2), (6,g2),  (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1)
 
 
  *On collapsing by group_s  expected output is : *  (7,g2), (3,g1)
 
  solr standard collapsing does give this output  with
  group=on,group.field=group_s,group.main=true
 
  * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} :
(5,g2), (1,g1)
 
 
 
  * Summarizing Jira Discussion :*
  1. CollapsingQParserPlugin picks up the group heads from matching results
  and passes those further. So in essence filtering some of the matching
  documents, so that subsequent collectors never see them. It can also pass
  on score to subsequent collectors using a dummy scorer.
 
  2. TopDocCollector comes later in hierarchy and it will sort on the
  collapsed set. That works fine.
 
  The issue is with step 1. Collapsing is done by a single comparator which
  can take its value from a field or function. It defaults to score.
  Function queries do allow us to combine multiple fields / value sources,
  however it would be difficult to construct a function for given sort
  fields. Primarily because
  a) The range of values for a given sort field is not known in
 advance.
  It is possible for one sort field to unbounded, but other to be bounded
  within a small range.
  b) The sort field can itself hold custom logic.
 
  Because of (a) the group head selected by CollapsingQParserPlugin will be
  incorrect and subsequent sorting will break.
 
 
 
  On 14 June

CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true

2014-06-24 Thread Umesh Prasad
(ThreadLeakControl.java:360)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at java.lang.Thread.run(Thread.java:745)




---
Thanks  Regards
Umesh Prasad


Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken

2014-06-19 Thread Umesh Prasad
Continuing the discussion on mailing list from Jira.

An Example


*id  group   f1  f2*1   g1
5   10
2   g1 5   1000
3   g1 5   1000
4   g1 10  100
5   g2 5   10
6   g2 5   1000
7   g2 5   1000
8   g210  100

sort= f1 asc, f2 desc , id desc


*Without collapse will give : *
(7,g2), (6,g2),  (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1)


*On collapsing by group_s  expected output is : *  (7,g2), (3,g1)

solr standard collapsing does give this output  with
group=on,group.field=group_s,group.main=true

* Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} :
  (5,g2), (1,g1)



* Summarizing Jira Discussion :*
1. CollapsingQParserPlugin picks up the group heads from matching results
and passes those further. So in essence filtering some of the matching
documents, so that subsequent collectors never see them. It can also pass
on score to subsequent collectors using a dummy scorer.

2. TopDocCollector comes later in hierarchy and it will sort on the
collapsed set. That works fine.

The issue is with step 1. Collapsing is done by a single comparator which
can take its value from a field or function. It defaults to score.
Function queries do allow us to combine multiple fields / value sources,
however it would be difficult to construct a function for given sort
fields. Primarily because
a) The range of values for a given sort field is not known in advance.
It is possible for one sort field to unbounded, but other to be bounded
within a small range.
b) The sort field can itself hold custom logic.

Because of (a) the group head selected by CollapsingQParserPlugin will be
incorrect and subsequent sorting will break.



On 14 June 2014 12:38, Umesh Prasad umesh.i...@gmail.com wrote:

 Thanks Joel for the quick response. I have opened a new jira ticket.

 https://issues.apache.org/jira/browse/SOLR-6168




 On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote:

 Let's open a new ticket.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  The patch in SOLR-5408 fixes the issue with sorting only for two sort
  fields. Sorting still breaks when 3 or more sort fields are used.
 
  I have attached a test case, which demonstrates the broken behavior
 when 3
  sort fields are used.
 
  The failing test case patch is against Lucene/Solr 4.7 revision  number
  1602388
 
  Can someone apply and verify the bug ?
 
  Also, should I re-open SOLR-5408  or open a new ticket ?
 
 
  ---
  Thanks  Regards
  Umesh Prasad
 




 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken

2014-06-14 Thread Umesh Prasad
Thanks Joel for the quick response. I have opened a new jira ticket.

https://issues.apache.org/jira/browse/SOLR-6168




On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote:

 Let's open a new ticket.

 Joel Bernstein
 Search Engineer at Heliosearch


 On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  The patch in SOLR-5408 fixes the issue with sorting only for two sort
  fields. Sorting still breaks when 3 or more sort fields are used.
 
  I have attached a test case, which demonstrates the broken behavior when
 3
  sort fields are used.
 
  The failing test case patch is against Lucene/Solr 4.7 revision  number
  1602388
 
  Can someone apply and verify the bug ?
 
  Also, should I re-open SOLR-5408  or open a new ticket ?
 
 
  ---
  Thanks  Regards
  Umesh Prasad
 




-- 
---
Thanks  Regards
Umesh Prasad


Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken

2014-06-13 Thread Umesh Prasad
The patch in SOLR-5408 fixes the issue with sorting only for two sort
fields. Sorting still breaks when 3 or more sort fields are used.

I have attached a test case, which demonstrates the broken behavior when 3
sort fields are used.

The failing test case patch is against Lucene/Solr 4.7 revision  number
1602388

Can someone apply and verify the bug ?

Also, should I re-open SOLR-5408  or open a new ticket ?


---
Thanks  Regards
Umesh Prasad


Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used

2013-12-14 Thread Umesh Prasad
Thanks a lot Joel ..
For now I have taken it from trunk and verified the patched code works
fine ..



On Thu, Dec 12, 2013 at 9:21 PM, Joel Bernstein joels...@gmail.com wrote:

 Hi,

 This is a known issue resolved in SOLR-5408. It's fixed in trunk and 4x and
 if there is a 4.6.1 it will be in there. If not it will be Solr 4.7.

 https://issues.apache.org/jira/browse/SOLR-5408

 Joel


 On Wed, Dec 11, 2013 at 11:36 PM, Umesh Prasad umesh.i...@gmail.com
 wrote:

  Issue occurs in Single Segment index also ..
 
  sort: score desc,floSalesRank asc
  response: {
 
 - numFound: 21461,
 - start: 0,
 - maxScore: 4.4415073,
 - docs: [
- {
   - floSalesRank: 0,
   - score: 0.123750895,
   - [docid]: 9208
   -
 
 
 
 
  On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com
  wrote:
 
   Hi All,
   I am using new CollapsingQParserPlugin for Grouping and found that
 it
   works incorrectly when I use multiple sort criteria.
  
  
  
  
 
 http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22
  
 
 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id}
  
  
  - sort: score desc,floSalesRank asc,
  - fl: score,floSalesRank,[docid],
  - start: 0,
  - q: car and toys,
  - facet.field: store_path,
  - fq: {!collapse field=item_id}
  
  
   response:
  
   {
  
  - numFound: 21461,
  - start: 0,
  - maxScore: 4.447499,
  - docs: [
 - {
- floSalesRank: 0,
- score: 0.12396862,
- [docid]: 9703
},
 - {
 -
  
  
   I found a bug opened for same
   https://issues.apache.org/jira/browse/SOLR-5408 ..
  
  
   The bug is closed but I am not really sure that it works specially for
   Multiple segment parts ..
  
   I am using Solr 4.6.0 and my index contains 4 segments ..
  
   Have anyone else faced the same issue ?
  
   ---
   Thanks  Regards
   Umesh Prasad
  
 
 
 
  --
  ---
  Thanks  Regards
  Umesh Prasad
 



 --
 Joel Bernstein
 Search Engineer at Heliosearch




-- 
---
Thanks  Regards
Umesh Prasad


CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used

2013-12-11 Thread Umesh Prasad
Hi All,
I am using new CollapsingQParserPlugin for Grouping and found that it
works incorrectly when I use multiple sort criteria.


http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22
^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id}


   - sort: score desc,floSalesRank asc,
   - fl: score,floSalesRank,[docid],
   - start: 0,
   - q: car and toys,
   - facet.field: store_path,
   - fq: {!collapse field=item_id}


response:

{

   - numFound: 21461,
   - start: 0,
   - maxScore: 4.447499,
   - docs: [
  - {
 - floSalesRank: 0,
 - score: 0.12396862,
 - [docid]: 9703
 },
  - {
  -


I found a bug opened for same
https://issues.apache.org/jira/browse/SOLR-5408 ..


The bug is closed but I am not really sure that it works specially for
Multiple segment parts ..

I am using Solr 4.6.0 and my index contains 4 segments ..

Have anyone else faced the same issue ?

---
Thanks  Regards
Umesh Prasad


Re: CollapsingQParserPlugin scores incorrectly in Solr 4.6.0 when multiple sort criteria are used

2013-12-11 Thread Umesh Prasad
Issue occurs in Single Segment index also ..

sort: score desc,floSalesRank asc
response: {

   - numFound: 21461,
   - start: 0,
   - maxScore: 4.4415073,
   - docs: [
  - {
 - floSalesRank: 0,
 - score: 0.123750895,
 - [docid]: 9208
 -




On Thu, Dec 12, 2013 at 9:50 AM, Umesh Prasad umesh.i...@gmail.com wrote:

 Hi All,
 I am using new CollapsingQParserPlugin for Grouping and found that it
 works incorrectly when I use multiple sort criteria.



 http://localhost:8080/solr/toys/select/?q=car%20and%20toysversion=2.2start=0rows=10indent=onsort=score%20desc,floSalesRank%20ascfacet=onfacet.field=store_pathfacet.mincount=1bq=store_path:%22mgl/ksc/gcv%22
 ^10wt=jsonfl=score,floSalesRank,[docid]bq=id:STFDCHZM3552AHXE^1000fq={!collapse%20field=item_id}


- sort: score desc,floSalesRank asc,
- fl: score,floSalesRank,[docid],
- start: 0,
- q: car and toys,
- facet.field: store_path,
- fq: {!collapse field=item_id}


 response:

 {

- numFound: 21461,
- start: 0,
- maxScore: 4.447499,
- docs: [
   - {
  - floSalesRank: 0,
  - score: 0.12396862,
  - [docid]: 9703
  },
   - {
   -


 I found a bug opened for same
 https://issues.apache.org/jira/browse/SOLR-5408 ..


 The bug is closed but I am not really sure that it works specially for
 Multiple segment parts ..

 I am using Solr 4.6.0 and my index contains 4 segments ..

 Have anyone else faced the same issue ?

 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: Solr Core Reload causing JVM Memory Leak through FieldCache/LRUCache/LFUCache

2013-11-15 Thread Umesh Prasad
Mailing list by default removes attachments. So uploaded it to google drive
..

https://drive.google.com/file/d/0B-RnB4e-vaJhX280NVllMUdHYWs/edit?usp=sharing



On Fri, Nov 15, 2013 at 2:28 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Hi All,
 We are seeing memory leaks in our Search application whenever core
 reload happens after replication.
We are using Solr 3.6.2 and I have observed this consistently on all
 servers.

 The leak suspect analysis from MAT is attached with the mail.

  #1425afb4a706064b_  Problem Suspect 1

 One instance of *org.apache.lucene.search.FieldCacheImpl*loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupies *8,726,099,312 (35.49%)* bytes. The memory is
 accumulated in one instance of*java.util.HashMap$Entry[]* loaded by 
 *system
 class loader*.

 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 java.util.HashMap$Entry[]
 org.apache.lucene.search.FieldCacheImpl

 Problem Suspect 2

 69 instances of *org.apache.solr.util.ConcurrentLRUCache*, loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupy *6,309,187,392 (25.66%)* bytes.

 Biggest instances:

- org.apache.solr.util.ConcurrentLRUCache @
0x7f7fe74ef120 - 755,575,672 (3.07%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e74b7a068 - 728,731,344 (2.96%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7d0a6bd1b8 - 711,828,392 (2.90%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7c6c12e800 - 708,657,624 (2.88%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7fcb092058 - 568,473,352 (2.31%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7f268cb2f0 - 568,400,040 (2.31%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e31b60c58 - 544,078,600 (2.21%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e65c2b2d8 - 489,578,480 (1.99%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7d81ea8538 - 467,833,720 (1.90%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7f31996508 - 444,383,992 (1.81%) bytes.



 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 org.apache.solr.util.ConcurrentLRUCache
 Details » http://pages/24.html

 194 instances of *org.apache.solr.util.ConcurrentLFUCache*, loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupy *4,583,727,104 (18.64%)* bytes.

 Biggest instances:

- org.apache.solr.util.ConcurrentLFUCache @
0x7f7cdd4735a0 - 410,628,176 (1.67%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7c7d48e180 - 390,690,864 (1.59%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7f1edfd008 - 348,193,312 (1.42%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7f37b01990 - 340,595,920 (1.39%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7fe02d8dd8 - 274,611,632 (1.12%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7fa9dcfb20 - 253,848,232 (1.03%) bytes.



 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 org.apache.solr.util.ConcurrentLFUCache


 ---
 Thanks  Regards
 Umesh Prasad

 SDE @ Flipkart  : The Online Megastore at your doorstep ..




-- 
---
Thanks  Regards
Umesh Prasad


Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1

2013-05-22 Thread Umesh Prasad
Hi Shawn,
Thanks for the advice :). The JVM heap Size usage on indexer machine
has been consistency about 95% (both total and old gen) for past 3 days. It
might have nothing to do with Solr 3.6 Vs solr 4.2 .. Because Solr 3.6
indexer gets restarted once in 2-3  days.
  Will investigate why memory usage is so high on indexer.



On Wed, May 22, 2013 at 10:03 AM, Shawn Heisey s...@elyograg.org wrote:

 On 5/21/2013 9:22 PM, Umesh Prasad wrote:
  This is our own implementation of data source (canon name
  com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data
  from out downstream service and it doesn't cache data in RAM. It fetches
  the data in batches of 200 and iterates over it when DIH asks for it. I
  will check the possibility of leak, but unlikely.
 Can OOM issue be because during analysis, IndexWriter finds the
  document to be too large to fit in 100 MB memory and can't flush to disk
 ?
  Our analyzer chain doesn't make easy (specially with a field like) (does
 a
  cross product of synonyms terms)

 If your documents are really large (hundreds of KB, or a few MB), you
 might need a bigger ramBufferSizeMB value ... but if that were causing
 problems, I would expect it to show up during import, not at commit time.

 How much of your 32GB heap is in use during indexing?  Would you be able
 to try with the heap at 31GB instead of 32GB?  One of Java's default
 optimizations (UseCompressedOops) gets turned off with a heap size of
 32GB because it doesn't work any more, and that might lead to strange
 things happening.

 Do you have the ability to try 4.3 instead of 4.2.1?

 Thanks,
 Shawn




-- 
---
Thanks  Regards
Umesh Prasad


Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1

2013-05-21 Thread Umesh Prasad
We have sufficient RAM on machine ..64 GB and we have given JVM 32 GB of
memory. The machine runs Indexing primarily.

The JVM doesn't run out of memory. It is the particular IndexWriterSolrCore
which has .. May be we have specified too low a memory for IndexWriter ..

We index mainly product data and use DIH to pull data from downstream
services. Autocommiit is off. The commit is infrequent  for legacy
reasons.. 1 commit in 2-3 hrs. It it makes a difference, then, a Core can
have more than10 lakh documents uncommitted at a time. IndexWriter has a
memory of 100 MB
 We ran with same config on Solr 3.5 and we never ran out of Memory.
But then, I hadn't tried hard commits on Solr 3.5.

Data-Source Entry :
dataConfig
dataSource name=products type=MultiSPCMSProductsDataSource
spCmsHost=$config.spCmsHost spCmsPort=$config.spCmsPort
spCmsTimeout=3 cmsBatchSize=200 psURL=$config.psUrl
autoCommit=false/
document name=products
entity name=item pk=id
transformer=w3.solr.transformers.GenericProductsTransformer
dataSource=products
/entity
/document
/dataConfig

IndexConfig.

ramBufferSizeMB100/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength5/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout





On Tue, May 21, 2013 at 7:07 PM, Jack Krupansky j...@basetechnology.comwrote:

 Try again on a machine with more memory. Or did you do that already?

 -- Jack Krupansky

 -Original Message- From: Umesh Prasad
 Sent: Tuesday, May 21, 2013 1:57 AM
 To: solr-user@lucene.apache.org
 Subject: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1


 Hi All,
   I am hitting an OOM error while trying to do an hard commit on one of
 the cores.

 Transaction log dir is Empty and DIH shows indexing going on for  13 hrs..

 *Indexing since 13h 22m 22s*
 Requests: 5,211,392 (108/s), Fetched: 1,902,792 (40/s), Skipped: 106,853,
 Processed: 1,016,696 (21/s)
 Started: about 13 hours ago



 response
 lst name=responseHeaderint name=status500/intint
 name=QTime4/int/lstlst name=errorstr name=msgthis writer hit
 an OutOfMemoryError; cannot commit/strstr
 name=tracejava.lang.**IllegalStateException: this writer hit an
 OutOfMemoryError; cannot commit
at
 org.apache.lucene.index.**IndexWriter.**prepareCommitInternal(**
 IndexWriter.java:2661)
at
 org.apache.lucene.index.**IndexWriter.commitInternal(**
 IndexWriter.java:2827)
at org.apache.lucene.index.**IndexWriter.commit(**
 IndexWriter.java:2807)
at
 org.apache.solr.update.**DirectUpdateHandler2.commit(**
 DirectUpdateHandler2.java:536)
at
 org.apache.solr.update.**processor.RunUpdateProcessor.**processCommit(**
 RunUpdateProcessorFactory.**java:95)
at
 org.apache.solr.update.**processor.**UpdateRequestProcessor.**
 processCommit(**UpdateRequestProcessor.java:**64)
at
 org.apache.solr.update.**processor.**DistributedUpdateProcessor.**
 processCommit(**DistributedUpdateProcessor.**java:1055)
at
 org.apache.solr.update.**processor.LogUpdateProcessor.**processCommit(**
 LogUpdateProcessorFactory.**java:157)
at
 org.apache.solr.handler.**RequestHandlerUtils.**handleCommit(**
 RequestHandlerUtils.java:69)
at
 org.apache.solr.handler.**ContentStreamHandlerBase.**handleRequestBody(**
 ContentStreamHandlerBase.java:**68)
at
 org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
 RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817)
at
 org.apache.solr.servlet.**SolrDispatchFilter.execute(**
 SolrDispatchFilter.java:639)
at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:345)
at
 org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
 SolrDispatchFilter.java:141)
at
 org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(**
 ApplicationFilterChain.java:**235)
at
 org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
 ApplicationFilterChain.java:**206)
at
 org.apache.catalina.core.**StandardWrapperValve.invoke(**
 StandardWrapperValve.java:233)
at
 org.apache.catalina.core.**StandardContextValve.invoke(**
 StandardContextValve.java:191)
at
 org.apache.catalina.core.**StandardHostValve.invoke(**
 StandardHostValve.java:127)
at
 org.apache.catalina.valves.**ErrorReportValve.invoke(**
 ErrorReportValve.java:102)
at
 org.apache.catalina.core.**StandardEngineValve.invoke(**
 StandardEngineValve.java:109)
at
 org.apache.catalina.valves.**AccessLogValve.invoke(**
 AccessLogValve.java:554)
at
 org.apache.catalina.connector.**CoyoteAdapter.service(**
 CoyoteAdapter.java:298)
at
 org.apache.coyote.http11.**Http11Processor.process(**
 Http11Processor.java:859)
at
 org.apache.coyote.http11.**Http11Protocol$**Http11ConnectionHandler.**
 process(Http11Protocol.java:**588)
at
 org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(**
 JIoEndpoint.java:489)
at java.lang.Thread.run

Re: Hard Commit giving OOM Error on Index Writer in Solr 4.2.1

2013-05-21 Thread Umesh Prasad
Hi Shawn,
This is our own implementation of data source (canon name
com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data
from out downstream service and it doesn't cache data in RAM. It fetches
the data in batches of 200 and iterates over it when DIH asks for it. I
will check the possibility of leak, but unlikely.
   Can OOM issue be because during analysis, IndexWriter finds the
document to be too large to fit in 100 MB memory and can't flush to disk ?
Our analyzer chain doesn't make easy (specially with a field like) (does a
cross product of synonyms terms)

fieldType name=textStemmed class=solr.TextField indexed=true
stored=false multiValued=true positionIncrementGap=100
omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.*SynonymFilterFactory* synonyms=*
synonyms_index.txt* ignoreCase=true expand=*true*/
 filter class=solr.KStemFilterFactory /
filter class=solr.EnglishMinimalStemFilterFactory/
   filter class=solr.*SynonymFilterFactory* synonyms=*
synonyms_index.txt* ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.SynonymFilterFactory
synonyms=synonyms_index.txt ignoreCase=true expand=true/
filter class=solr.KStemFilterFactory /
filter
class=solr.EnglishMinimalStemFilterFactory/
 filter class=solr.SynonymFilterFactory
synonyms=synonyms_index.txt ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 /analyzer
/fieldType




On Wed, May 22, 2013 at 5:03 AM, Shawn Heisey s...@elyograg.org wrote:

 On 5/21/2013 5:14 PM, Umesh Prasad wrote:

 We have sufficient RAM on machine ..64 GB and we have given JVM 32 GB of
 memory. The machine runs Indexing primarily.

 The JVM doesn't run out of memory. It is the particular
 IndexWriterSolrCore
 which has .. May be we have specified too low a memory for IndexWriter ..

 We index mainly product data and use DIH to pull data from downstream
 services. Autocommiit is off. The commit is infrequent  for legacy
 reasons.. 1 commit in 2-3 hrs. It it makes a difference, then, a Core can
 have more than10 lakh documents uncommitted at a time. IndexWriter has a
 memory of 100 MB
   We ran with same config on Solr 3.5 and we never ran out of Memory.
 But then, I hadn't tried hard commits on Solr 3.5.


 Hard commits are the only kind of commits that Solr 3.x has.  It's soft
 commits that are new with 4.x.


  Data-Source Entry :
 dataConfig
 dataSource name=products type=**MultiSPCMSProductsDataSource


 This appears to be using a custom data source, not one of the well-known
 types.  If it had been JDBC, I would be saying that your JDBC driver is
 trying to cache the entire result set in RAM.  With a MySQL data source, a
 batchSize of -1 fixes this problem, by internally changing the JDBC
 fetchSize to Integer.MIN_VALUE.  Other databases have different mechanisms.

 With this data source, I have no idea at all how to make sure that it
 doesn't cache all results in RAM.  It might be that the combination of the
 new Solr and this custom data source causes a memory leak, something that
 doesn't happen with the old Solr version.

 You said that the transaction log directory is empty.  That rules out one
 possibility, which would be solved by the autoCommit settings on this page:

 http://wiki.apache.org/solr/**SolrPerformanceProblems#Slow_**startuphttp://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

 Aside from the memory leak idea, or possibly having your entire source
 data cached in RAM, I have no idea what's happening here.

 Thanks,
 Shawn




-- 
---
Thanks  Regards
Umesh Prasad


Hard Commit giving OOM Error on Index Writer in Solr 4.2.1

2013-05-20 Thread Umesh Prasad
Hi All,
   I am hitting an OOM error while trying to do an hard commit on one of
the cores.

Transaction log dir is Empty and DIH shows indexing going on for  13 hrs..

*Indexing since 13h 22m 22s*
Requests: 5,211,392 (108/s), Fetched: 1,902,792 (40/s), Skipped: 106,853,
Processed: 1,016,696 (21/s)
Started: about 13 hours ago



response
lst name=responseHeaderint name=status500/intint
name=QTime4/int/lstlst name=errorstr name=msgthis writer hit
an OutOfMemoryError; cannot commit/strstr
name=tracejava.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:536)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1055)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1817)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:554)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)




-- 
---
Thanks  Regards
Umesh Prasad


Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-22 Thread Umesh Prasad
Sorry for late reply. I was trying to change our indexing pipeline and do
explicit intermediate commits for each core. That turned out to be a bit
more work that I have time for.

So, I do want to explore hard commits.  I tried
solr-host:port/solr/core/*update?commit=true* . But there is no
impact on Txn Log size, so I feel, it must be getting ignored.

So can someone tell me, how to do the Hard Commits ?

@Shawn : openSearcher = false is not an option. On Each commit, index will
be replicated to Slaves which will have a searcher on it immediately and
can intermediate state. The longer term and better solution is changing
indexing pipeline and doing explicit commits, but I can't implement that
right now.




On 18 Apr 2013 00:35, Shawn Heisey s...@elyograg.org wrote:

 On 4/17/2013 11:56 AM, Mark Miller wrote:

 There is one additional caveat - when you disable the updateLog, you have
 to switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory.
  The NRT directory implementation will cache a portion of a commit
 (including hard commits) into RAM instead of onto disk.  On the next
 commit, the previous one is persisted completely to disk.  Without a
 transaction log, you can lose data.


 I don't think this is true? NRTCachingDirectoryFactory should not cache
 hard commits and should be as safe as MMapDirectoryFactory is - neither of
 which is as safe as using a tran log.


 This is based on observations of what happens with my segment files when I
 do a full-import, using autoCommit with openSearcher disabled.  I see that
 each autoCommit results in a full segment being written, the part of
 another segment.  On the next autoCommit, the rest of the files for the
 last segment are written, another full segment is written, I get another
 partial segment.  I asked about this on the list some time ago, and what I
 just told Umesh is a rehash of what I understood from Yonik's response.

 If I'm wrong, I hope someone who knows for sure can correct me.

 Thanks,
 Shawn




Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Umesh Prasad
Thanks Erick.

Couple of Questions :
Our transaction logs are huge as we have disabled auto commit. The biggest
one is 6.1 GB.

*567M*autosuggest/data/tlog
*22M* avmediaCore/data/tlog
*388M*booksCore/data/tlog
*4.9G *   books/data/tlog
*6.1G *   mp3-downloads/data/tlog ( 150 % of index Size)
1*.5G*next-5/data/tlog
690Mqueries/data/tlog  ( 25 % of Index Size )
207MqueryProduct/data/tlog  (100 % of Index Size)

Btw, I am surprised by the size of transaction log, because that is a
significant amount of index size itself

2.6Gautosuggest/data/index
992MavmediaCore/data/index
12G booksCore/data/index
4.2Gmp3-downloads-new/data/index
45G next-5/data/index
2.9Gqueries/data/index
*220M*queryProduct/data/index


We use DIH and have turned off the Auto commit because we have to sometimes
build index from Scratch (clean=true) and we not want to
Our master server sees a lot of restarts, sometimes 2-3 times a day. It
polls other Data Sources for updates which are quite a few. Master
maintains a version of last committed version and can handle uncommitted
changes.

Given the frequent restarts, We can't really afford a huge start up at this
point.
 In the worst case, does Solr allow for disabling transactional log ?

Here is our Index Config

indexConfig
   !-- Values here affect all index writers and act as a default unless
overridden. --
useCompoundFilefalse/useCompoundFile

mergeFactor10/mergeFactor
ramBufferSizeMB32/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength5/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

lockTypesingle/lockType

!-- options specific to the main on-disk lucene index --
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB32/ramBufferSizeMB
mergeFactor5/mergeFactor
!-- Deprecated --
!--maxBufferedDocs1000/maxBufferedDocs--
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength5/maxFieldLength

unlockOnStartupfalse/unlockOnStartup

deletionPolicy class=solr.SolrDeletionPolicy
  !-- The number of commit points to be kept --
  str name=maxCommitsToKeep5/str
  !-- The number of optimized commit points to be kept --
  str name=maxOptimizedCommitsToKeep0/str
 str name=maxCommitAge2HOUR/str
/deletionPolicy
/indexConfig



Thanks  Regards
Umesh Prasad



On Wed, Apr 17, 2013 at 4:57 PM, Erick Erickson erickerick...@gmail.comwrote:

 How big are you transaction logs? They can be replayed on startup.
 They are truncated and a new one started when you do a hard commit
 (openSearcher true or false doesn't matter).

 So a quick test of this theory would be to just stop your indexing
 process, issue a hard commit on all your cores and _then_ try to
 restart. If it comes up immediately, you've identified your problem.

 Best
 Erick

 On Tue, Apr 16, 2013 at 8:33 AM, Umesh Prasad umesh.i...@gmail.com
 wrote:
  Hi,
  We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing
  Exception on Restart. What is More, it take a hell lot of Time ( More
 than
  one hour to get Up and Running)
 
 
  THE exception After Restart ...
  =
  Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates
  update
  WARNING: Unexpected log entry or corrupt log.  Entry=11
  java.lang.ClassCastException: java.lang.Long cannot be cast to
  java.util.List
  at
  org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929)
  at
 
 org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863)
  at
  org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014)
  at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253)
  at
  org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82)
  at
  org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137)
  at
  org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123)
  at
 
 org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at
 java.lang.reflect.Constructor.newInstance(Constructor.java:513)
  at
 org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
  at
  org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:806)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
  at
 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
  at
  org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051

Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-16 Thread Umesh Prasad
 org.apache.solr.handler.SnapPuller fetchLatestIndex
SEVERE: Master at: http://localhost:25280/solr/cameras is not available.
Index fetch failed. Exception:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://localhost:25280/solr/cameras


 Before Restart : Server was running Incremental Indexing ( Triggered by a
Cron). The cron triggers every 5 mins for each of about 40 Cores. This was
the same with Solr 3.5 also. But we never faced any issues.



-- 
---
Thanks  Regards
Umesh Prasad


Re: Downloaded Solr 4.2.1 Source: Build Failing

2013-04-14 Thread Umesh Prasad
 j*ava version 1.6.0_43
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
*
Mac OS X : Version 10.7.5

--
Umesh



On Sat, Apr 13, 2013 at 12:08 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :
 /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
 : *omponent/QueryComponent.java:765: cannot find symbol
 : [javac] symbol  : class ShardFieldSortedHitQueue
 : [javac] location: class
 org.apache.solr.handler.component.QueryComponent
 : [javac]   ShardFieldSortedHitQueue queue;*

 Weird ... can you provide us more details about the java compiler you are
 using?

 ShardFieldSortedHitQueue is a package protected class declared in
 ShardDoc.java (in the same package as QueryComponent).  That isn't exactly
 a best practice, but it shouldn't be causing a compilation failure.


 -Hoss




-- 
---
Thanks  Regards
Umesh Prasad


Re: Downloaded Solr 4.2.1 Source: Build Failing

2013-04-14 Thread Umesh Prasad
Further update on same.
Build on Branch
http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1 succeeds
fine.
Build fails only for Source code downloaded from
http://apache.techartifact.com/mirror/lucene/solr/4.2.1/solr-4.2.1-src.tgz




On Sun, Apr 14, 2013 at 1:05 PM, Umesh Prasad umesh.i...@gmail.com wrote:

  j*ava version 1.6.0_43
 Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
 Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
 *
 Mac OS X : Version 10.7.5

 --
 Umesh



 On Sat, Apr 13, 2013 at 12:08 AM, Chris Hostetter 
 hossman_luc...@fucit.org wrote:


 :
 /Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
 : *omponent/QueryComponent.java:765: cannot find symbol
 : [javac] symbol  : class ShardFieldSortedHitQueue
 : [javac] location: class
 org.apache.solr.handler.component.QueryComponent
 : [javac]   ShardFieldSortedHitQueue queue;*

 Weird ... can you provide us more details about the java compiler you are
 using?

 ShardFieldSortedHitQueue is a package protected class declared in
 ShardDoc.java (in the same package as QueryComponent).  That isn't exactly
 a best practice, but it shouldn't be causing a compilation failure.


 -Hoss




 --
 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: Not able to replicate the solr 3.5 indexes to solr 4.2 indexes

2013-04-13 Thread Umesh Prasad
Hi Erick,
I have already created a Jira and also attached a Path. But no unit
tests. My local build is failing (building from solr 4.2.1 source jar).
Please see
https://issues.apache.org/jira/browse/SOLR-4703
.
--
Umesh


On Sat, Apr 13, 2013 at 7:24 PM, Erick Erickson erickerick...@gmail.comwrote:

 Please make a JIRA and attach as a patch if there aren't any JIRAs
 for this yet.

 Best
 Erick

 On Fri, Apr 12, 2013 at 1:58 AM, Montu v Boda
 montu.b...@highqsolutions.com wrote:
  hi
 
  thanks for your reply.
 
  is anyone is going to fix this issue in new solr version? because there
 are
  so many guys facing the same problem while upgrading the solr index
 3.5.0 to
  solr 4.2
 
  Thanks  Regards
  Montu v Boda
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Not-able-to-replicate-the-solr-3-5-indexes-to-solr-4-2-indexes-tp4055313p4055477.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
---
Thanks  Regards
Umesh Prasad


Downloaded Solr 4.2.1 Source: Build Failing

2013-04-12 Thread Umesh Prasad
common.compile-core:
[javac] Compiling 337 source files to
/Users/umeshprasad/Downloads/solr-4.2.1/solr/build/solr-core/classes/java
[javac]
/Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/c
*omponent/QueryComponent.java:765: cannot find symbol
[javac] symbol  : class ShardFieldSortedHitQueue
[javac] location: class org.apache.solr.handler.component.QueryComponent
[javac]   ShardFieldSortedHitQueue queue;*
[javac]   ^
[javac]
/Users/umeshprasad/Downloads/solr-4.2.1/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java:766:
cannot find symbol
[javac] symbol  : class ShardFieldSortedHitQueue
[javac] location: class org.apache.solr.handler.component.QueryComponent
[javac]   queue = new ShardFieldSortedHitQueue(sortFields,
ss.getOffset() + ss.getCount());
[javac]   ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 2 errors


-- 
---
Thanks  Regards
Umesh Prasad


Re: Index Replication Failing in Solr 4.2.1

2013-04-11 Thread Umesh Prasad
Created Jira Issue
https://issues.apache.org/jira/browse/SOLR-4703 and attached the Patch. No
unit tests yet.


On Fri, Apr 12, 2013 at 12:59 AM, Mark Miller markrmil...@gmail.com wrote:

 I was looking for this msg the other day and couldn't find it offhand…

 +1, please add this to JIRA so someone can look into it and it does not
 get lost!

 - Mark

 On Apr 11, 2013, at 11:17 AM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

  Hi Umesh,
 
  The attachment didn't make it through.  Could you please add it to
  JIRA? http://wiki.apache.org/solr/HowToContribute
 
  Thanks,
  Otis
  --
  Solr  ElasticSearch Support
  http://sematext.com/
 
 
 
 
 
  On Wed, Apr 10, 2013 at 9:43 PM, Umesh Prasad umesh.i...@gmail.com
 wrote:
  Root caused the Issue to a Code Bug / Contract Violation  in SnapPuller
 in
  solr 4.2.1 (impacts trunk as well) and Fixed by Patching the SnapPuller
  locally.
 
  fetchfilelist API expects indexversion to be specified as param.
 
  So Call to Master should of be Form :
 
 /solr/phcare/replication?command=filelistgen=108213wt=jsonindexversion=1323961125908
  Instead Slave Calls the Master as :
  /solr/phcare/replication?command=filelistgen=108213wt=json
 
  Code bug lies in SnapPuller.fetchFileList(long gen)  which gets called
 by
  SnapPuller.fetchLatestIndex(final SolrCore core, boolean
 forceReplication)
 
  The fix is pass along the version to fetchFileList and populate it.
 
  A Patch is attached for trunk.
 
 
  Thanks  Regards
  Umesh Prasad
  Search Engineer @ Flipkart : India's Online Megastore
  -
  Empowering Consumers Find Products ..
 
 
 
 
 
  On Tue, Apr 9, 2013 at 9:28 PM, Umesh Prasad umesh.i...@gmail.com
 wrote:
 
  Hi All,
   I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is
 running
  fine and set to go, except the master slave replication.
 
  We use master slave replication with multi cores ( 1 master, 10 slaves
 and
  20 plus cores).
 
  My Configuration is :
 
  Master :  Solr 3.5.0,  Has existing index, and delta import running
 using
  DIH.
  Slave : Solr 4.2.1 ,  Has no startup index
 
 
  Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
  INFO: [phcare] webapp= path=/replication
  params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1
  Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
  INFO: Master's generation: 107876
  Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
  INFO: Slave's generation: 79248
  Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
  INFO: Starting replication process
  Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList
  SEVERE: No files to download for index generation: 107876
  Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
  INFO: [phcare] webapp= path=/replication
  params={command=details_=1365522520556wt=json} status=0 QTime=7
 
  In Both Master and Slave The File list for replicable version is
 correct.
  on Slave
 
  {
 
  masterDetails: {
 
  indexSize: 4.31 MB,
  indexPath:
  /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012,
  commits: [
 
  [
 
  indexVersion,
  1323961124638,
  generation,
  107856,
  filelist,
  [
 
  _45e1.tii,
  _45e1.nrm,
 
  ..
 
 
  ON Master
 
  [
 
  indexVersion,
  1323961124638,
  generation,
  107856,
  filelist,
  [
 
  _45e1.tii,
  _45e1.nrm,
  _45e2_1.del,
  _45e2.frq,
  _45e1_3.del,
  _45e1.tis,
  ..
 
 
 
  Can someone help. Our whole Migration to Solr 4.2 is blocked on
  Replication issue.
 
  ---
  Thanks  Regards
  Umesh Prasad
 
 
 
 
  --
  ---
  Thanks  Regards
  Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Re: Index Replication Failing in Solr 4.2.1

2013-04-10 Thread Umesh Prasad
Root caused the Issue to a Code Bug / Contract Violation  in SnapPuller in
solr 4.2.1 (impacts trunk as well) and Fixed by Patching the SnapPuller
locally.

fetchfilelist API expects indexversion to be specified as param.

So Call to Master should of be Form :
/solr/phcare/replication?command=filelistgen=108213wt=jsonindexversion=1323961125908
Instead Slave Calls the Master as :
/solr/phcare/replication?command=filelistgen=108213wt=json

Code bug lies in SnapPuller.fetchFileList(long gen)  which gets called by
SnapPuller.fetchLatestIndex(final SolrCore core, boolean forceReplication)

The fix is pass along the version to fetchFileList and populate it.

A Patch is attached for trunk.


Thanks  Regards
Umesh Prasad
Search Engineer @ Flipkart : India's Online Megastore
-
Empowering Consumers Find Products ..





On Tue, Apr 9, 2013 at 9:28 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Hi All,
   I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running
 fine and set to go, except the master slave replication.

 We use master slave replication with multi cores ( 1 master, 10 slaves and
 20 plus cores).

 My Configuration is :

 Master :  Solr 3.5.0,  Has existing index, and delta import running using
 DIH.
 Slave : Solr 4.2.1 ,  Has no startup index


 Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
 INFO: [phcare] webapp= path=/replication
 params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1
 Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
 *INFO: Master's generation: 107876
 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
 *INFO: Slave's generation: 79248
 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller
 fetchLatestIndex
 INFO: Starting replication process
 *Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList
 SEVERE: No files to download for index generation: 107876
 *Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
 INFO: [phcare] webapp= path=/replication
 params={command=details_=1365522520556wt=json} status=0 QTime=7

 In Both Master and Slave The File list for replicable version is correct.
 *on Slave *

 {

- masterDetails: {
   - indexSize: 4.31 MB,
   - indexPath:
   /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012,
   - commits: [
  - [
 - indexVersion,
 - 1323961124638,
 - generation,
 - 107856,
 - filelist,
 - [
- _45e1.tii,
- _45e1.nrm,
-

 ..


 *ON Master
 *
 [

- indexVersion,
- 1323961124638,
- generation,
- 107856,
- filelist,
- [
   - _45e1.tii,
   - _45e1.nrm,
   - _45e2_1.del,
   - _45e2.frq,
   - _45e1_3.del,
   - _45e1.tis,
   - ..



 Can someone help. Our whole Migration to Solr 4.2 is blocked on
 Replication issue.

 ---
 Thanks  Regards
 Umesh Prasad




-- 
---
Thanks  Regards
Umesh Prasad


Index Replication Failing in Solr 4.2.1

2013-04-09 Thread Umesh Prasad
Hi All,
  I am migrating from Solr 3.5.0 to Solr 4.2.1. And everything is running
fine and set to go, except the master slave replication.

We use master slave replication with multi cores ( 1 master, 10 slaves and
20 plus cores).

My Configuration is :

Master :  Solr 3.5.0,  Has existing index, and delta import running using
DIH.
Slave : Solr 4.2.1 ,  Has no startup index


Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
INFO: [phcare] webapp= path=/replication
params={command=fetchindex_=1365522520521wt=json} status=0 QTime=1
Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
*INFO: Master's generation: 107876
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
*INFO: Slave's generation: 79248
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchLatestIndex
INFO: Starting replication process
*Apr 9, 2013 9:18:40 PM org.apache.solr.handler.SnapPuller fetchFileList
SEVERE: No files to download for index generation: 107876
*Apr 9, 2013 9:18:40 PM org.apache.solr.core.SolrCore execute
INFO: [phcare] webapp= path=/replication
params={command=details_=1365522520556wt=json} status=0 QTime=7

In Both Master and Slave The File list for replicable version is correct.
*on Slave *

{

   - masterDetails: {
  - indexSize: 4.31 MB,
  - indexPath:
  /var/lib/fk-w3-sherlock/cores/phcare/data/index.20130124235012,
  - commits: [
 - [
- indexVersion,
- 1323961124638,
- generation,
- 107856,
- filelist,
- [
   - _45e1.tii,
   - _45e1.nrm,
   -

..


*ON Master
*
[

   - indexVersion,
   - 1323961124638,
   - generation,
   - 107856,
   - filelist,
   - [
  - _45e1.tii,
  - _45e1.nrm,
  - _45e2_1.del,
  - _45e2.frq,
  - _45e1_3.del,
  - _45e1.tis,
  - ..



Can someone help. Our whole Migration to Solr 4.2 is blocked on Replication
issue.

---
Thanks  Regards
Umesh Prasad


Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-21 Thread Umesh Prasad
[] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[x] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

On Fri, Jan 21, 2011 at 10:01 PM, mike anderson saidthero...@gmail.com wrote:
 [x] ASF Mirrors (linked in our release announcements or via the Lucene
 website)

 [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [x] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a
 downstream project)




-- 
---
Thanks  Regards
Umesh Prasad