hits=XXX not always there in solr.log.* file?!?
Hi, I'm puzzled by this issue and was wondering if anyone knows why. Basically I am trying to get hit counts from my solr.log.* files for analysis purpose. However, I noticed that sometimes for a request I don't get a hits=xyz shown. Here are 2 example log snippets from my solr.log.2010_01_07 file, one with 'hits=' count and one without from the same given solr instance: -- query WITH 'hits=xyz' count in log -- INFO: [items] webapp=/solr path=/select params={spellcheck=truefacet=truesort=item_pubDate+descfacet.limit=21hl=trueversion=2.2f.cat_title.facet.sort=indexf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:[1+TO+3]fq=grouping_id:AMS-141002-2010-01-07fq=-item_id:127272858fq=-item_id:127272859f.cat_title.facet.method=fcf.user_key.facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=item_title+item_descwt=jsonspellcheck.collate=truespellcheck.onlyMorePopular=falserows=100f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1} hits=755 status=0 QTime=290 --- query WITHOUT hits=xyz count in log -- INFO: [items] webapp=/solr path=/select params={spellcheck=truecollapse.info.doc=falsefacet=truesort=item_pubDate+descfacet.limit=21hl=truef.cat_title.facet.sort=indexversion=2.2collapse.field=grouping_idf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:3f.cat_title.facet.method=fccollapse.threshold=2f.user_key.facet.mincount=1spellcheck.extendedResults=truehl.fl=item_title+item_descjson.nl=mapspellcheck.collate=truewt=jsonspellcheck.onlyMorePopular=falserows=10f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1} status=0 QTime=42 Thanks for any info or help. Michael -- View this message in context: http://old.nabble.com/hits%3DXXX-not-always-there-in-solr.log.*-file-%21--tp27080137p27080137.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hits=XXX not always there in solr.log.* file?!? collapse field related?
Update: from my further investigation, it appears that anytime I am using the collapse field feature (I am running collapse field patch on 1.4), then the hits= count is not shown in the log. Anyone can confirm? michael8 wrote: Hi, I'm puzzled by this issue and was wondering if anyone knows why. Basically I am trying to get hit counts from my solr.log.* files for analysis purpose. However, I noticed that sometimes for a request I don't get a hits=xyz shown. Here are 2 example log snippets from my solr.log.2010_01_07 file, one with 'hits=' count and one without from the same given solr instance: -- query WITH 'hits=xyz' count in log -- INFO: [items] webapp=/solr path=/select params={spellcheck=truefacet=truesort=item_pubDate+descfacet.limit=21hl=trueversion=2.2f.cat_title.facet.sort=indexf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:[1+TO+3]fq=grouping_id:AMS-141002-2010-01-07fq=-item_id:127272858fq=-item_id:127272859f.cat_title.facet.method=fcf.user_key.facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=item_title+item_descwt=jsonspellcheck.collate=truespellcheck.onlyMorePopular=falserows=100f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1} hits=755 status=0 QTime=290 --- query WITHOUT hits=xyz count in log -- INFO: [items] webapp=/solr path=/select params={spellcheck=truecollapse.info.doc=falsefacet=truesort=item_pubDate+descfacet.limit=21hl=truef.cat_title.facet.sort=indexversion=2.2collapse.field=grouping_idf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:3f.cat_title.facet.method=fccollapse.threshold=2f.user_key.facet.mincount=1spellcheck.extendedResults=truehl.fl=item_title+item_descjson.nl=mapspellcheck.collate=truewt=jsonspellcheck.onlyMorePopular=falserows=10f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1} status=0 QTime=42 Thanks for any info or help. Michael -- View this message in context: http://old.nabble.com/hits%3DXXX-not-always-there-in-solr.log.*-file-%21--tp27080137p27080234.html Sent from the Solr - User mailing list archive at Nabble.com.
How to get Solr 1.4 to replicate spellcheck directories as well?
I'm currently using Solr 1.4 with its built-in solr.ReplicationHandler enabled in solrconfig.xml for a master and slave as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,protwords.txt,spellings.txt,stopwords.txt,synonyms.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://searchhost:8983/solr/items/replication/str str name=pollInterval00:00:60/str /lst /requestHandler Everything in the index is replicated perfectly except that my spellcheck directories are not being replicated. Here is my spellcheck config in solrconfig.xml: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldspell/str str name=spellcheckIndexDir./spellchecker1/str str name=buildOnCommitfalse/str /lst lst name=spellchecker str name=namejarowinkler/str str name=fieldspell/str !-- Use a different Distance Measure -- str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str str name=spellcheckIndexDir./spellchecker2/str str name=buildOnCommitfalse/str /lst lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellcheckerFile/str str name=buildOnCommitfalse/str /lst /searchComponent I have set the buildOnCommit to 'false', but instead have a separate cron to build my spellcheck dictionaries on a nightly basis. Is there a way to tell Solr to also replicate the spellcheck files too? Is my setting 'buildOnCommit' to 'false' causing my spellcheck files to not replicate? I would think after the nightly build is triggered and done (via cron) that the spellcheck files would be replicated by that is not the case. Thanks for any help or info. Michael -- View this message in context: http://old.nabble.com/How-to-get-Solr-1.4-to-replicate-spellcheck-directories-as-well--tp26812569p26812569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field collapse using 'adjacent' 'includeCollapsedDocs' + 'sort' query field
Hi Martijn, Thanks for your insight of collapsedDocs, and what I need to modify if I need the functionality I want. Michael Martijn v Groningen wrote: Hi Micheal, What you are saying seems logical, but that is currently not the case with the collapsedDocs functionality. This functionality was build with computing aggregated statistics in mind and not really to have a separate collapse group search result. Although the collapsed documents are collected in the order the appear in the search result (only if collapsetype is adjacent) they are not saved in the order they appear. If you really need to have the collapse group search result in the order they were collapsed you need to tweak the code. What you can do is change the CollapsedDocumentCollapseCollector class in the DocumentFieldsCollapseCollectorFactory.java source file. Currently the document ids are stored inside a OpenBitSet per collapse group. You can change that into an ArrayListInteger for example. In this way the order in where the documents were collapsed is preserved. I think the downside of this change will be to increase of memory usage. OpenBitSet is memory wise more efficient then an ArrayList of integers. I think that this will only be a real problem when the collapse groups become very large. I hope this will answer your question. Martijn 2009/11/14 michael8 mich...@saracatech.com: Hi, This almost seems like a bug, but I can't be sure so I'm seeking confirmation. Basically I am building a site that presents search results in reverse chronologically order. I am also leveraging the field collapse feature so that I can group results using 'adjacent' mode and have solr return the collapsed results as well via 'includeCollapsedDocs'. My collapsing field is a custom grouping_id that I have specified. What I'm noticing is that, my search results are coming back in the correct order by descending time (via 'sort' param in the main query) as expected. However, the results returned within the 'collapsedDocs' section via 'includeCollapsedDocs' are not in the same descending time order. My question is, shouldn't the collapsedDocs results also be in the same 'sort' order and key I have specified in the overall query, particularly since 'adjacent' mode is enabled, and that would mean results that are 'adjacent' in the sort order of the results. I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009 Thanks, Michael -- View this message in context: http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/field-collapse-%27includeCollapsedDocs%27-doesn%27t-return-results-within-%27collapsedDocs%27-in-%27sort%27-order-specified-tp26351840p26360433.html Sent from the Solr - User mailing list archive at Nabble.com.
field collapse using 'adjacent' 'includeCollapsedDocs' + 'sort' query field
Hi, This almost seems like a bug, but I can't be sure so I'm seeking confirmation. Basically I am building a site that presents search results in reverse chronologically order. I am also leveraging the field collapse feature so that I can group results using 'adjacent' mode and have solr return the collapsed results as well via 'includeCollapsedDocs'. My collapsing field is a custom grouping_id that I have specified. What I'm noticing is that, my search results are coming back in the correct order by descending time (via 'sort' param in the main query) as expected. However, the results returned within the 'collapsedDocs' section via 'includeCollapsedDocs' are not in the same descending time order. My question is, shouldn't the collapsedDocs results also be in the same 'sort' order and key I have specified in the overall query, particularly since 'adjacent' mode is enabled, and that would mean results that are 'adjacent' in the sort order of the results. I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009 Thanks, Michael -- View this message in context: http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sanizing/filtering query string for security
Thanks guys for your input and suggestions! Michael Otis Gospodnetic wrote: Word of warning: Careful with q.alt=*:* if you are dealing with large indices! :) Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Alexey Serba ase...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 5:23:52 PM Subject: Re: sanizing/filtering query string for security BTW, I have not used DisMax handler yet, but does it handle *:* properly? See q.alt DisMax parameter http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt You can specify q.alt=*:* and q as empty string to get all results. do you care if users issue this query I allow users to issue an empty search and get all results with all facets / etc. It's a nice navigation UI btw. Basically given my UI, I'm trying to *hide* the total count from users searching for *everything* If you don't specify q.alt parameter then Solr returns zero results for empty search. *:* won't work either. though this syntax has helped me debug/monitor the state of my search doc pool size. see q.alt Alex On Tue, Nov 10, 2009 at 12:59 AM, michael8 wrote: Sounds like a nice approach you have done. BTW, I have not used DisMax handler yet, but does it handle *:* properly? IOW, do you care if users issue this query, or does DisMax treat this query string differently than standard request handler? Basically given my UI, I'm trying to *hide* the total count from users searching for *everything*, though this syntax has helped me debug/monitor the state of my search doc pool size. Thanks, Michael Alexey-34 wrote: I added some kind of pre and post processing of Solr results for this, i.e. If I find fieldname specified in query string in form of fieldname:term then I pass this query string to standard request handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler doesn't break the query, at least I haven't seen yet ). If standard request handler throws error ( invalid field, too many clauses, etc ) then I pass original query to DisMax request handler. Alex On Mon, Nov 9, 2009 at 10:05 PM, michael8 wrote: Hi Julian, Saw you post on exactly the question I have. I'm curious if you got any response directly, or figured out a way to do this by now that you could share? I'm in the same situation trying to 'sanitize' the query string coming in before handing it to solr. I do see that characters like : could break the query, but am curious if anyone has come up with a general solution as I think this must be a fairly common problem for any solr deployment to tackle. Thanks, Michael Julian Davchev wrote: Hi, Is there anything special that can be done for sanitizing user input before passed as query to solr. Not allowing * and ? as first char is only thing I can thing of right now. Anything else it should somehow handle. I am not able to find any relevant document. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26283657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sanizing/filtering query string for security
Hi Julian, Saw you post on exactly the question I have. I'm curious if you got any response directly, or figured out a way to do this by now that you could share? I'm in the same situation trying to 'sanitize' the query string coming in before handing it to solr. I do see that characters like : could break the query, but am curious if anyone has come up with a general solution as I think this must be a fairly common problem for any solr deployment to tackle. Thanks, Michael Julian Davchev wrote: Hi, Is there anything special that can be done for sanitizing user input before passed as query to solr. Not allowing * and ? as first char is only thing I can thing of right now. Anything else it should somehow handle. I am not able to find any relevant document. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sanizing/filtering query string for security
Sounds like a nice approach you have done. BTW, I have not used DisMax handler yet, but does it handle *:* properly? IOW, do you care if users issue this query, or does DisMax treat this query string differently than standard request handler? Basically given my UI, I'm trying to *hide* the total count from users searching for *everything*, though this syntax has helped me debug/monitor the state of my search doc pool size. Thanks, Michael Alexey-34 wrote: I added some kind of pre and post processing of Solr results for this, i.e. If I find fieldname specified in query string in form of fieldname:term then I pass this query string to standard request handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler doesn't break the query, at least I haven't seen yet ). If standard request handler throws error ( invalid field, too many clauses, etc ) then I pass original query to DisMax request handler. Alex On Mon, Nov 9, 2009 at 10:05 PM, michael8 mich...@saracatech.com wrote: Hi Julian, Saw you post on exactly the question I have. I'm curious if you got any response directly, or figured out a way to do this by now that you could share? I'm in the same situation trying to 'sanitize' the query string coming in before handing it to solr. I do see that characters like : could break the query, but am curious if anyone has come up with a general solution as I think this must be a fairly common problem for any solr deployment to tackle. Thanks, Michael Julian Davchev wrote: Hi, Is there anything special that can be done for sanitizing user input before passed as query to solr. Not allowing * and ? as first char is only thing I can thing of right now. Anything else it should somehow handle. I am not able to find any relevant document. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about collapse.type = adjacent
Hi Martijn, This clarifies it all for me. Thanks a lot! Michael Martijn v Groningen wrote: Hi Micheal, Field collapsing is basicly done in two steps. The first step is to get the uncollapsed sorted (whether it is score or a field value) documents and the second step is to apply the collapse algorithm on the uncollapsed documents. So yes, when specifying collapse.type=adjacent the documents can get collapsed after the sort has been applied, but this also the case when not specifying collapse.type=adjacent I hope this answers your question. Cheers, Martijn 2009/11/2 michael8 mich...@saracatech.com: Hi, I would like to confirm if 'adjacent' in collapse.type means the documents (with the same collapse field value) are considered adjacent *after* the 'sort' param from the query has been applied, or *before*? I would think it would be *after* since collapse feature primarily is meant for presentation use. Thanks, Michael -- View this message in context: http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html Sent from the Solr - User mailing list archive at Nabble.com. -- Met vriendelijke groet, Martijn van Groningen -- View this message in context: http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26189401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: apply a patch on solr
Perfect. This is what I need to know instead of patching 'in the dark'. Good thing SVN revision cuts across all files like a tag. Thanks Mike! Michael cambridgemike wrote: You can see what revision the patch was written for at the top of the patch, it will look like this: Index: org/apache/solr/handler/MoreLikeThisHandler.java === --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437) +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy) now check out revision 772437 using the --revision switch in svn, patch away, and then svn up to make sure everything merges cleanly. This is a good guide to follow as well: http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html cheers, -mike On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com wrote: Hi, First I like to pardon my novice question on patching solr (1.4). What I like to know is, given a patch, like the one for collapse field, how would one go about knowing what solr source that patch is meant for since this is a source level patch? Wouldn't the exact versions of a set of java files to be patched critical for the patch to work properly? So far what I have done is to pull the latest collapse field patch down from http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and then svn up the latest trunk from http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. Intuitively I was thinking I should be doing svn up to a specific revision/tag instead of just latest. So far everything seems fine, but I just want to make sure I'm doing the right thing and not just being lucky. Thanks, Michael -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26189573.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: apply a patch on solr
Hmmm, perhaps I jumped the gun. I just looked over the field collapse patch for SOLR-236 and each file listed in the patch has its own revision #. E.g. from field-collapse-5.patch: --- src/java/org/apache/solr/core/SolrConfig.java (revision 824364) --- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java (revision 816372) --- src/solrj/org/apache/solr/client/solrj/SolrQuery.java (revision 823653) --- src/java/org/apache/solr/search/SolrIndexSearcher.java (revision 794328) --- src/java/org/apache/solr/search/DocSetHitCollector.java (revision 794328) Unless there is a better way, it seems like I would need to do svn up --revision ... for each of the files to be patched and then apply the patch? This seems error prone and tedious. Am I missing something simpler here? Michael michael8 wrote: Perfect. This is what I need to know instead of patching 'in the dark'. Good thing SVN revision cuts across all files like a tag. Thanks Mike! Michael cambridgemike wrote: You can see what revision the patch was written for at the top of the patch, it will look like this: Index: org/apache/solr/handler/MoreLikeThisHandler.java === --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437) +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy) now check out revision 772437 using the --revision switch in svn, patch away, and then svn up to make sure everything merges cleanly. This is a good guide to follow as well: http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html cheers, -mike On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com wrote: Hi, First I like to pardon my novice question on patching solr (1.4). What I like to know is, given a patch, like the one for collapse field, how would one go about knowing what solr source that patch is meant for since this is a source level patch? Wouldn't the exact versions of a set of java files to be patched critical for the patch to work properly? So far what I have done is to pull the latest collapse field patch down from http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and then svn up the latest trunk from http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. Intuitively I was thinking I should be doing svn up to a specific revision/tag instead of just latest. So far everything seems fine, but I just want to make sure I'm doing the right thing and not just being lucky. Thanks, Michael -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html Sent from the Solr - User mailing list archive at Nabble.com.
question about collapse.type = adjacent
Hi, I would like to confirm if 'adjacent' in collapse.type means the documents (with the same collapse field value) are considered adjacent *after* the 'sort' param from the query has been applied, or *before*? I would think it would be *after* since collapse feature primarily is meant for presentation use. Thanks, Michael -- View this message in context: http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html Sent from the Solr - User mailing list archive at Nabble.com.
apply a patch on solr
Hi, First I like to pardon my novice question on patching solr (1.4). What I like to know is, given a patch, like the one for collapse field, how would one go about knowing what solr source that patch is meant for since this is a source level patch? Wouldn't the exact versions of a set of java files to be patched critical for the patch to work properly? So far what I have done is to pull the latest collapse field patch down from http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and then svn up the latest trunk from http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. Intuitively I was thinking I should be doing svn up to a specific revision/tag instead of just latest. So far everything seems fine, but I just want to make sure I'm doing the right thing and not just being lucky. Thanks, Michael -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157826p26157826.html Sent from the Solr - User mailing list archive at Nabble.com.
apply a patch on solr
Hi, First I like to pardon my novice question on patching solr (1.4). What I like to know is, given a patch, like the one for collapse field, how would one go about knowing what solr source that patch is meant for since this is a source level patch? Wouldn't the exact versions of a set of java files to be patched critical for the patch to work properly? So far what I have done is to pull the latest collapse field patch down from http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and then svn up the latest trunk from http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. Intuitively I was thinking I should be doing svn up to a specific revision/tag instead of just latest. So far everything seems fine, but I just want to make sure I'm doing the right thing and not just being lucky. Thanks, Michael -- View this message in context: http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dih.last_index_time - exacty what time is this capturing?
Thanks for your clarification Shalin. Given your explanation, would you agree that there is still a small window (how ever small this may be) where some documents could be missed in the next delta using dih.last_index_time if the data source adds or updates documents very frequently? i.e. the time between the SQL done executing and data received by Solr to start indexing, some new/updated documents may have been written in the DB such that the timestamps for those documents are slightly before the captured last_index_time when indexing starts? Michael Shalin Shekhar Mangar wrote: On Sat, Oct 10, 2009 at 1:42 AM, michael8 mich...@saracatech.com wrote: Hi, Does anyone know when exactly is the dih.last_index_time in dataimport.properties captured? E.g. start of issueing SQL to data source, end of executing SQL to data source to fetch the list of IDs that have changed since last index, end of indexing all changed/new documents? The name seems to imply 'end of indexing all changed/new docs', but i just want to be sure. last_index_time is set to current date/time before the actual indexing is started. The rationale is not to miss any documents. If we had set the last_index_time after the indexing is completed then we may lose the rows inserted/modified after the query of the previous import. In the current setup, some documents may get re-imported again but because most users have a uniqueKey, it is not a big problem. Also, I noticed a discrepancy between the commented time string and the actual last_index_time value. Is the commented time (#) the time the file was written, vs. the actual last index time? #Fri Oct 09 13:01:57 PDT 2009 item.last_index_time=2009-10-09 12\:58\:10 last_index_time=2009-10-09 12\:58\:10 The commented time is the time at which the property file was written. This is automatically added by Java's Properties class. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25844816.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dih.last_index_time - exacty what time is this capturing?
That's perfect. Reimporting and reindexing some redundantly because of the slight time overlap is worth the risk of losing docs. Thanks Shalin. Michael Shalin Shekhar Mangar wrote: On Sun, Oct 11, 2009 at 9:46 PM, michael8 mich...@saracatech.com wrote: Thanks for your clarification Shalin. Given your explanation, would you agree that there is still a small window (how ever small this may be) where some documents could be missed in the next delta using dih.last_index_time if the data source adds or updates documents very frequently? i.e. the time between the SQL done executing and data received by Solr to start indexing, some new/updated documents may have been written in the DB such that the timestamps for those documents are slightly before the captured last_index_time when indexing starts? The last_index_time is recorded before any SQL queries are fired so I don't think any rows could be missed. Some could be imported more than once though. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25850464.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
Thanks Shalin. Patch works well for me too. Michael Shalin Shekhar Mangar wrote: On Thu, Oct 8, 2009 at 1:38 AM, michael8 mich...@saracatech.com wrote: 2 things I noticed that are different from 1.3 to 1.4 for DataImport: 1. there are now 2 datetime values (per my specific schema I'm sure) in the dataimport.properties vs. only 1 in 1.3 (using the exact same schema). One is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named item.last_index_time, where 'item' is my main and only entity name specified in my data-import.xml. they both have the same value. This was added with SOLR-783 to enable delta imports of entities individually. One can specify the entity name(s) which should be imported. Without this it was not possible to correctly figure out deltas on a per-entity basis. 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009', with the day of week, name of month, and timezone spelled out. I had issue with the 1.4 format with MySQL only for the timezone part, but now I have a different solution without using this last index date altogether. I just committed SOLR-1496 so the different date format issue is fixed in trunk. I'm curious though if there's any config setting to pass to DataImportHandler to specify the desired date/time format to use. There is no configuration to change this. However, you can write your own Evaluator to output ${dih.last_index_time} in whatever format you prefer. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25826421.html Sent from the Solr - User mailing list archive at Nabble.com.
dih.last_index_time - exacty what time is this capturing?
Hi, Does anyone know when exactly is the dih.last_index_time in dataimport.properties captured? E.g. start of issueing SQL to data source, end of executing SQL to data source to fetch the list of IDs that have changed since last index, end of indexing all changed/new documents? The name seems to imply 'end of indexing all changed/new docs', but i just want to be sure. Also, I noticed a discrepancy between the commented time string and the actual last_index_time value. Is the commented time (#) the time the file was written, vs. the actual last index time? #Fri Oct 09 13:01:57 PDT 2009 item.last_index_time=2009-10-09 12\:58\:10 last_index_time=2009-10-09 12\:58\:10 Thanks, Michael -- View this message in context: http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25827228.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
2 things I noticed that are different from 1.3 to 1.4 for DataImport: 1. there are now 2 datetime values (per my specific schema I'm sure) in the dataimport.properties vs. only 1 in 1.3 (using the exact same schema). One is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named item.last_index_time, where 'item' is my main and only entity name specified in my data-import.xml. they both have the same value. 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009', with the day of week, name of month, and timezone spelled out. I had issue with the 1.4 format with MySQL only for the timezone part, but now I have a different solution without using this last index date altogether. I'm curious though if there's any config setting to pass to DataImportHandler to specify the desired date/time format to use. Michael Noble Paul നോബിള് नोब्ळ्-2 wrote: really? I don't remember that being changed. what difference do u notice? On Wed, Oct 7, 2009 at 2:30 AM, michael8 mich...@saracatech.com wrote: Just looking for confirmation from others, but it appears that the formatting of last_index_time from dataimport.properties (using DataImportHandler) is different in 1.4 vs. that in 1.3. I was troubleshooting why delta imports are no longer working for me after moving over to solr 1.4 (10/2 nighly) and noticed that format is different. Michael -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25793468.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?
Just looking for confirmation from others, but it appears that the formatting of last_index_time from dataimport.properties (using DataImportHandler) is different in 1.4 vs. that in 1.3. I was troubleshooting why delta imports are no longer working for me after moving over to solr 1.4 (10/2 nighly) and noticed that format is different. Michael -- View this message in context: http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: download pre-release nightly solr 1.4
markrmiller wrote: michael8 wrote: markrmiller wrote: michael8 wrote: Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new features (e.g. field collapsing)? Thanks, Michael You can download nightlies here:http://people.apache.org/builds/lucene/solr/nightly/ field collapsing won't be in 1.4 though. You have to build from svn after applying the patch for that. -- - Mark http://www.lucidimagination.com Thanks for the info Mark. If field collapsing is a patch, can I apply the patch against 1.3 then? Thanks again. Michael Not likely - it has to apply to the current code. If you can find an old patch that works with 1.3 (not sure when the patches for that started), its possible. But you would be using a very old patch (not sure there is one that applies to 1.3 trunk either, but you could check). -- - Mark http://www.lucidimagination.com Thanks again Mark. I think it's better that I go with patching 1.4 when it's ready for field collapse feature. Michael -- View this message in context: http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25649529.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: download pre-release nightly solr 1.4
markrmiller wrote: michael8 wrote: Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new features (e.g. field collapsing)? Thanks, Michael You can download nightlies here:http://people.apache.org/builds/lucene/solr/nightly/ field collapsing won't be in 1.4 though. You have to build from svn after applying the patch for that. -- - Mark http://www.lucidimagination.com Thanks for the info Mark. If field collapsing is a patch, can I apply the patch against 1.3 then? Thanks again. Michael -- View this message in context: http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25615553.html Sent from the Solr - User mailing list archive at Nabble.com.
download pre-release nightly solr 1.4
Hi, I know Solr 1.4 is going to be released any day now pending Lucene 2.9 release. Is there anywhere where one can download a pre-released nighly build of Solr 1.4 just for getting familiar with new features (e.g. field collapsing)? Thanks, Michael -- View this message in context: http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25590281.html Sent from the Solr - User mailing list archive at Nabble.com.
Looking for suggestion of WordDelimiter filter config and 'ALMA awards'
Hi, I have this situation that I believe is very common but was curious if anyone knows the right way to go about solving it. I have a document with 'ALMA awards' in it. However, when user searches for 'aLMA awards', it ends up with no results found. However, when I search for 'alma awards' or 'ALMA awards', the right results came back as expected. I immediately went to solr/admin/analysis to see what is going on with indexing of 'ALMA awards' and query parsing of 'aLMA awards', and looks like WordDelimiter is the one causing the mismatched. WordDelimiter, with splitOnCaseChange=1, will turn my search query 'aLMA awards' into 'a' and 'LMA' and 'awards', which is exactly what splitOnCaseChange does. In this type of situation, is there a proper way to handle such a situation whereby the user simply got the case wrong for the 1st letter, or maybe n letters? I like the benefits that WordDelimiter filter w/ splitOnCaseChange provides me, but I am not sure what is the proper way to solve this situation without compromising on the other benefits this filter provides. I also tried preserveOriginal=1, hoping that aLMA will be preserved and later on became all lowercase alma via another filter, but with no luck. P.S.: I am basically using the standard config for 'text' fieldtype for my default search field. (solr 1.3) Thanks, Michael -- View this message in context: http://www.nabble.com/Looking-for-suggestion-of-WordDelimiter-filter-config-and-%27ALMA-awards%27-tp25591381p25591381.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: standard requestHandler components
Hi Jay, I got it from reading your response. I did browse around in solrconfig.xml but could not find any components configured for 'standard', but didn't realized that there are 'defaults' hardwired. Thanks for your quick detailed response and also your additional tip on spellcheck config. You saved me lots of time on trial--error. Regards, Michael Jay Hill wrote: RequestHandlers are configured in solrconfig.xml. If no components are explicitly declared in the request handler config the the defaults are used. They are: - QueryComponent - FacetComponent - MoreLikeThisComponent - HighlightComponent - StatsComponent - DebugComponent If you wanted to have a custom list of components (either omitting defaults or adding custom) you can specify the components for a handler directly: arr name=components strquery/str strfacet/str strmlt/str strhighlight/str strdebug/str strsomeothercomponent/str /arr You can add components before or after the main ones like this: arr name=first-components strmycomponent/str /arr arr name=last-components strmyothercomponent/str /arr and that's how the spell check component can be added: arr name=last-components strspellcheck/str /arr Note that the a component (except the defaults) must be configured in solrconfig.xml with the name used in the str element as well. Have a look at the solrconfig.xml in the example directory (.../example/solr/conf/) for examples on how to set up the spellcheck component, and on how the request handlers are configured. -Jay http://www.lucidimagination.com On Fri, Sep 11, 2009 at 3:04 PM, michael8 mich...@saracatech.com wrote: Hi, I have a newbie question about the 'standard' requestHandler in solrconfig.xml. What I like to know is where is the config information for this requestHandler kept? When I go to http://localhost:8983/solr/admin, I see the following info, but am curious where are the supposedly 'chained' components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent) configured for this requestHandler. I see timing and process debug output from these components with debugQuery=true, so somewhere these components must have been configured for this 'standard' requestHandler. name:standard class: org.apache.solr.handler.component.SearchHandler version:$Revision: 686274 $ description:Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1252703405335 requests : 3 errors : 0 timeouts : 0 totalTime : 201 avgTimePerRequest : 67.0 avgRequestsPerSecond : 0.015179728 What I like to do from understanding this is to properly integrate spellcheck component into the standard requestHandler as suggested in a solr spellcheck example. Thanks for any info in advance. Michael -- View this message in context: http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25414682.html Sent from the Solr - User mailing list archive at Nabble.com.
standard requestHandler components
Hi, I have a newbie question about the 'standard' requestHandler in solrconfig.xml. What I like to know is where is the config information for this requestHandler kept? When I go to http://localhost:8983/solr/admin, I see the following info, but am curious where are the supposedly 'chained' components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent) configured for this requestHandler. I see timing and process debug output from these components with debugQuery=true, so somewhere these components must have been configured for this 'standard' requestHandler. name:standard class: org.apache.solr.handler.component.SearchHandler version:$Revision: 686274 $ description:Search using components: org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent, stats: handlerStart : 1252703405335 requests : 3 errors : 0 timeouts : 0 totalTime : 201 avgTimePerRequest : 67.0 avgRequestsPerSecond : 0.015179728 What I like to do from understanding this is to properly integrate spellcheck component into the standard requestHandler as suggested in a solr spellcheck example. Thanks for any info in advance. Michael -- View this message in context: http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html Sent from the Solr - User mailing list archive at Nabble.com.