hits=XXX not always there in solr.log.* file?!?

2010-01-08 Thread michael8

Hi,
I'm puzzled by this issue and was wondering if anyone knows why.  Basically
I am trying to get hit counts from my solr.log.* files for analysis purpose. 
However, I noticed that sometimes for a request I don't get a hits=xyz
shown.  

Here are 2 example log snippets from my solr.log.2010_01_07 file, one with
'hits=' count and one without from the same given solr instance:

-- query WITH 'hits=xyz' count in log --
INFO: [items] webapp=/solr path=/select
params={spellcheck=truefacet=truesort=item_pubDate+descfacet.limit=21hl=trueversion=2.2f.cat_title.facet.sort=indexf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:[1+TO+3]fq=grouping_id:AMS-141002-2010-01-07fq=-item_id:127272858fq=-item_id:127272859f.cat_title.facet.method=fcf.user_key.facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=item_title+item_descwt=jsonspellcheck.collate=truespellcheck.onlyMorePopular=falserows=100f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1}
hits=755 status=0 QTime=290 

--- query WITHOUT hits=xyz count in log --
INFO: [items] webapp=/solr path=/select
params={spellcheck=truecollapse.info.doc=falsefacet=truesort=item_pubDate+descfacet.limit=21hl=truef.cat_title.facet.sort=indexversion=2.2collapse.field=grouping_idf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:3f.cat_title.facet.method=fccollapse.threshold=2f.user_key.facet.mincount=1spellcheck.extendedResults=truehl.fl=item_title+item_descjson.nl=mapspellcheck.collate=truewt=jsonspellcheck.onlyMorePopular=falserows=10f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1}
status=0 QTime=42 

Thanks for any info or help.

Michael
-- 
View this message in context: 
http://old.nabble.com/hits%3DXXX-not-always-there-in-solr.log.*-file-%21--tp27080137p27080137.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: hits=XXX not always there in solr.log.* file?!? collapse field related?

2010-01-08 Thread michael8

Update: from my further investigation, it appears that anytime I am using the
collapse field feature (I am running collapse field patch on 1.4), then the
hits= count is not shown in the log.  Anyone can confirm?


michael8 wrote:
 
 Hi,
 I'm puzzled by this issue and was wondering if anyone knows why. 
 Basically I am trying to get hit counts from my solr.log.* files for
 analysis purpose.  However, I noticed that sometimes for a request I don't
 get a hits=xyz shown.  
 
 Here are 2 example log snippets from my solr.log.2010_01_07 file, one with
 'hits=' count and one without from the same given solr instance:
 
 -- query WITH 'hits=xyz' count in log --
 INFO: [items] webapp=/solr path=/select
 params={spellcheck=truefacet=truesort=item_pubDate+descfacet.limit=21hl=trueversion=2.2f.cat_title.facet.sort=indexf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:[1+TO+3]fq=grouping_id:AMS-141002-2010-01-07fq=-item_id:127272858fq=-item_id:127272859f.cat_title.facet.method=fcf.user_key.facet.mincount=1spellcheck.extendedResults=truejson.nl=maphl.fl=item_title+item_descwt=jsonspellcheck.collate=truespellcheck.onlyMorePopular=falserows=100f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1}
 hits=755 status=0 QTime=290 
 
 --- query WITHOUT hits=xyz count in log --
 INFO: [items] webapp=/solr path=/select
 params={spellcheck=truecollapse.info.doc=falsefacet=truesort=item_pubDate+descfacet.limit=21hl=truef.cat_title.facet.sort=indexversion=2.2collapse.field=grouping_idf.credibility.facet.sort=indexspellcheck.count=1facet.field={!ex%3Dscat}cat_titlefacet.field=user_keyfacet.field={!ex%3Dscred}credibilityfq={!tag%3Dscred}credibility:3f.cat_title.facet.method=fccollapse.threshold=2f.user_key.facet.mincount=1spellcheck.extendedResults=truehl.fl=item_title+item_descjson.nl=mapspellcheck.collate=truewt=jsonspellcheck.onlyMorePopular=falserows=10f.item_title.hl.fragsize=105start=0q=Obamaf.item_desc.hl.fragsize=110f.user_key.facet.method=fcf.cat_title.facet.mincount=1}
 status=0 QTime=42 
 
 Thanks for any info or help.
 
 Michael
 

-- 
View this message in context: 
http://old.nabble.com/hits%3DXXX-not-always-there-in-solr.log.*-file-%21--tp27080137p27080234.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to get Solr 1.4 to replicate spellcheck directories as well?

2009-12-16 Thread michael8

I'm currently using Solr 1.4 with its built-in solr.ReplicationHandler
enabled in solrconfig.xml for a master and slave as follows:

  requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
  str name=enable${enable.master:false}/str
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str
name=confFilesschema.xml,protwords.txt,spellings.txt,stopwords.txt,synonyms.txt/str
/lst
lst name=slave
  str name=enable${enable.slave:false}/str
  str
name=masterUrlhttp://searchhost:8983/solr/items/replication/str
  str name=pollInterval00:00:60/str
/lst
  /requestHandler

Everything in the index is replicated perfectly except that my spellcheck
directories are not being replicated.  Here is my spellcheck config in
solrconfig.xml:

  searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldspell/str
  str name=spellcheckIndexDir./spellchecker1/str
  str name=buildOnCommitfalse/str

/lst
lst name=spellchecker
  str name=namejarowinkler/str
  str name=fieldspell/str
  !-- Use a different Distance Measure --
  str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  str name=spellcheckIndexDir./spellchecker2/str
  str name=buildOnCommitfalse/str

/lst

lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellcheckerFile/str
  str name=buildOnCommitfalse/str
/lst
  /searchComponent

I have set the buildOnCommit to 'false', but instead have a separate cron to
build my spellcheck dictionaries on a nightly basis.  

Is there a way to tell Solr to also replicate the spellcheck files too?  Is
my setting 'buildOnCommit' to 'false' causing my spellcheck files to not
replicate?  I would think after the nightly build is triggered and done (via
cron) that the spellcheck files would be replicated by that is not the case.

Thanks for any help or info.

Michael

-- 
View this message in context: 
http://old.nabble.com/How-to-get-Solr-1.4-to-replicate-spellcheck-directories-as-well--tp26812569p26812569.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: field collapse using 'adjacent' 'includeCollapsedDocs' + 'sort' query field

2009-11-15 Thread michael8

Hi Martijn,

Thanks for your insight of collapsedDocs, and what I need to modify if I
need the functionality I want.

Michael


Martijn v Groningen wrote:
 
 Hi Micheal,
 
 What you are saying seems logical, but that is currently not the case
 with the collapsedDocs functionality. This functionality was build
 with computing aggregated statistics in mind and not really to have a
 separate collapse group search result. Although the collapsed
 documents are collected in the order the appear in the search result
 (only if collapsetype is adjacent) they are not saved in the order
 they appear.
 
 If you really need to have the collapse group search result in the
 order they were collapsed you need to tweak the code. What you can do
 is change the CollapsedDocumentCollapseCollector class in the
 DocumentFieldsCollapseCollectorFactory.java source file. Currently the
 document ids are stored inside a OpenBitSet per collapse group. You
 can change that into an ArrayListInteger for example. In this way
 the order in where the documents were collapsed is preserved.
 
 I think the downside of this change will be to increase of memory
 usage. OpenBitSet is memory wise more efficient then an ArrayList of
 integers. I think that this will only be a real problem when the
 collapse groups become very large.
 
 I hope this will answer your question.
 
 Martijn
 
 2009/11/14 michael8 mich...@saracatech.com:

 Hi,

 This almost seems like a bug, but I can't be sure so I'm seeking
 confirmation.  Basically I am building a site that presents search
 results
 in reverse chronologically order.  I am also leveraging the field
 collapse
 feature so that I can group results using 'adjacent' mode and have solr
 return the collapsed results as well via 'includeCollapsedDocs'.  My
 collapsing field is a custom grouping_id that I have specified.

 What I'm noticing is that, my search results are coming back in the
 correct
 order by descending time (via 'sort' param in the main query) as
 expected.
 However, the results returned within the 'collapsedDocs' section via
 'includeCollapsedDocs' are not in the same descending time order.

 My question is, shouldn't the collapsedDocs results also be in the same
 'sort' order and key I have specified in the overall query, particularly
 since 'adjacent' mode is enabled, and that would mean results that are
 'adjacent' in the sort order of the results.

 I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009

 Thanks,
 Michael

 --
 View this message in context:
 http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/field-collapse-%27includeCollapsedDocs%27-doesn%27t-return-results-within-%27collapsedDocs%27-in-%27sort%27-order-specified-tp26351840p26360433.html
Sent from the Solr - User mailing list archive at Nabble.com.



field collapse using 'adjacent' 'includeCollapsedDocs' + 'sort' query field

2009-11-14 Thread michael8

Hi,

This almost seems like a bug, but I can't be sure so I'm seeking
confirmation.  Basically I am building a site that presents search results
in reverse chronologically order.  I am also leveraging the field collapse
feature so that I can group results using 'adjacent' mode and have solr
return the collapsed results as well via 'includeCollapsedDocs'.  My
collapsing field is a custom grouping_id that I have specified.

What I'm noticing is that, my search results are coming back in the correct
order by descending time (via 'sort' param in the main query) as expected. 
However, the results returned within the 'collapsedDocs' section via
'includeCollapsedDocs' are not in the same descending time order.  

My question is, shouldn't the collapsedDocs results also be in the same
'sort' order and key I have specified in the overall query, particularly
since 'adjacent' mode is enabled, and that would mean results that are
'adjacent' in the sort order of the results.

I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009

Thanks,
Michael

-- 
View this message in context: 
http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sanizing/filtering query string for security

2009-11-10 Thread michael8

Thanks guys for your input and suggestions!

Michael


Otis Gospodnetic wrote:
 
 Word of warning:
 Careful with q.alt=*:* if you are dealing with large indices! :)
 
 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 - Original Message 
 From: Alexey Serba ase...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Mon, November 9, 2009 5:23:52 PM
 Subject: Re: sanizing/filtering query string for security
 
  BTW, I have not used DisMax handler yet, but does it handle *:*
 properly?
 See q.alt DisMax parameter
 http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt
 
 You can specify q.alt=*:* and q as empty string to get all results.
 
  do you care if users issue this query
 I allow users to issue an empty search and get all results with all
 facets / etc. It's a nice navigation UI btw.
 
  Basically given my UI, I'm trying to *hide* the total count from users 
 searching for *everything*
 If you don't specify q.alt parameter then Solr returns zero results
 for empty search. *:* won't work either.
 
  though this syntax has helped me debug/monitor the state of my search
 doc pool 
 size.
 see q.alt
 
 Alex
 
 On Tue, Nov 10, 2009 at 12:59 AM, michael8 wrote:
 
  Sounds like a nice approach you have  done.  BTW, I have not used
 DisMax
  handler yet, but does it handle *:* properly?  IOW, do you care if
 users
  issue this query, or does DisMax treat this query string differently
 than
  standard request handler?  Basically given my UI, I'm trying to *hide*
 the
  total count from users searching for *everything*, though this syntax
 has
  helped me debug/monitor the state of my search doc pool size.
 
  Thanks,
  Michael
 
 
  Alexey-34 wrote:
 
  I added some kind of pre and post processing of Solr results for this,
  i.e.
 
  If I find fieldname specified in query string in form of
  fieldname:term then I pass this query string to standard request
  handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
  doesn't break the query, at least I haven't seen yet ). If standard
  request handler throws error ( invalid field, too many clauses, etc )
  then I pass original query to DisMax request handler.
 
  Alex
 
  On Mon, Nov 9, 2009 at 10:05 PM, michael8 wrote:
 
  Hi Julian,
 
  Saw you post on exactly the question I have.  I'm curious if you got
 any
  response directly, or figured out a way to do this by now that you
 could
  share?  I'm in the same situation trying to 'sanitize' the query
 string
  coming in before handing it to solr.  I do see that characters like
 :
  could break the query, but am curious if anyone has come up with a
  general
  solution as I think this must be a fairly common problem for any solr
  deployment to tackle.
 
  Thanks,
  Michael
 
 
  Julian Davchev wrote:
 
  Hi,
  Is there anything special that can be done for sanitizing user input
  before passed as query to solr.
  Not allowing * and ? as first char is only thing I can thing of
 right
  now. Anything else it should somehow handle.
 
  I am not able to find any relevant document.
 
 
 
  --
  View this message in context:
  
 http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
  --
  View this message in context: 
 http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26283657.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sanizing/filtering query string for security

2009-11-09 Thread michael8

Hi Julian,

Saw you post on exactly the question I have.  I'm curious if you got any
response directly, or figured out a way to do this by now that you could
share?  I'm in the same situation trying to 'sanitize' the query string
coming in before handing it to solr.  I do see that characters like :
could break the query, but am curious if anyone has come up with a general
solution as I think this must be a fairly common problem for any solr
deployment to tackle.

Thanks,
Michael


Julian Davchev wrote:
 
 Hi,
 Is there anything special that can be done for sanitizing user input
 before passed as query to solr.
 Not allowing * and ? as first char is only thing I can thing of right
 now. Anything else it should somehow handle.
 
 I am not able to find any relevant document.
 
 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sanizing/filtering query string for security

2009-11-09 Thread michael8

Sounds like a nice approach you have  done.  BTW, I have not used DisMax
handler yet, but does it handle *:* properly?  IOW, do you care if users
issue this query, or does DisMax treat this query string differently than
standard request handler?  Basically given my UI, I'm trying to *hide* the
total count from users searching for *everything*, though this syntax has
helped me debug/monitor the state of my search doc pool size.

Thanks,
Michael


Alexey-34 wrote:
 
 I added some kind of pre and post processing of Solr results for this,
 i.e.
 
 If I find fieldname specified in query string in form of
 fieldname:term then I pass this query string to standard request
 handler, otherwise use DisMaxRequestHandler ( DisMaxRequestHandler
 doesn't break the query, at least I haven't seen yet ). If standard
 request handler throws error ( invalid field, too many clauses, etc )
 then I pass original query to DisMax request handler.
 
 Alex
 
 On Mon, Nov 9, 2009 at 10:05 PM, michael8 mich...@saracatech.com wrote:

 Hi Julian,

 Saw you post on exactly the question I have.  I'm curious if you got any
 response directly, or figured out a way to do this by now that you could
 share?  I'm in the same situation trying to 'sanitize' the query string
 coming in before handing it to solr.  I do see that characters like :
 could break the query, but am curious if anyone has come up with a
 general
 solution as I think this must be a fairly common problem for any solr
 deployment to tackle.

 Thanks,
 Michael


 Julian Davchev wrote:

 Hi,
 Is there anything special that can be done for sanitizing user input
 before passed as query to solr.
 Not allowing * and ? as first char is only thing I can thing of right
 now. Anything else it should somehow handle.

 I am not able to find any relevant document.



 --
 View this message in context:
 http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26271891.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/sanizing-filtering-query-string-for-security-tp21516844p26274459.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: question about collapse.type = adjacent

2009-11-03 Thread michael8

Hi Martijn,

This clarifies it all for me.  Thanks a lot!

Michael


Martijn v Groningen wrote:
 
 Hi Micheal,
 
 Field collapsing is basicly done in two steps. The first step is to
 get the uncollapsed sorted (whether it is score or a field value)
 documents and the second step is to apply the collapse algorithm on
 the uncollapsed documents. So yes, when specifying
 collapse.type=adjacent the documents can get collapsed after the sort
 has been applied, but this also the case when not specifying
 collapse.type=adjacent
 I hope this answers your question.
 
 Cheers,
 
 Martijn
 
 2009/11/2 michael8 mich...@saracatech.com:

 Hi,

 I would like to confirm if 'adjacent' in collapse.type means the
 documents
 (with the same collapse field value) are considered adjacent *after* the
 'sort' param from the query has been applied, or *before*?  I would think
 it
 would be *after* since collapse feature primarily is meant for
 presentation
 use.

 Thanks,
 Michael
 --
 View this message in context:
 http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 Met vriendelijke groet,
 
 Martijn van Groningen
 
 

-- 
View this message in context: 
http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26189401.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: apply a patch on solr

2009-11-03 Thread michael8

Perfect.  This is what I need to know instead of patching 'in the dark'. 
Good thing SVN revision cuts across all files like a tag.

Thanks Mike!

Michael


cambridgemike wrote:
 
 You can see what revision the patch was written for at the top of the
 patch,
 it will look like this:
 
 Index: org/apache/solr/handler/MoreLikeThisHandler.java
 ===
 --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
 +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
 
 now check out revision 772437 using the --revision switch in svn, patch
 away, and then svn up to make sure everything merges cleanly.  This is a
 good guide to follow as well:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html
 
 cheers,
 -mike
 
 On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com wrote:
 

 Hi,

 First I like to pardon my novice question on patching solr (1.4).  What I
 like to know is, given a patch, like the one for collapse field, how
 would
 one go about knowing what solr source that patch is meant for since this
 is
 a source level patch?  Wouldn't the exact versions of a set of java files
 to
 be patched critical for the patch to work properly?

 So far what I have done is to pull the latest collapse field patch down
 from
 http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
 and
 then svn up the latest trunk from
 http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build.
 Intuitively I was thinking I should be doing svn up to a specific
 revision/tag instead of just latest.  So far everything seems fine, but I
 just want to make sure I'm doing the right thing and not just being
 lucky.

 Thanks,
 Michael
 --
 View this message in context:
 http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26189573.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: apply a patch on solr

2009-11-03 Thread michael8

Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch
for SOLR-236 and each file listed in the patch has its own revision #.  

E.g. from field-collapse-5.patch:
--- src/java/org/apache/solr/core/SolrConfig.java   (revision 824364)
--- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
(revision 816372)
--- src/solrj/org/apache/solr/client/solrj/SolrQuery.java   (revision 
823653)
--- src/java/org/apache/solr/search/SolrIndexSearcher.java  (revision 
794328)
--- src/java/org/apache/solr/search/DocSetHitCollector.java (revision
794328)

Unless there is a better way, it seems like I would need to do svn up
--revision ... for each of the files to be patched and then apply the
patch?  This seems error prone and tedious.  Am I missing something simpler
here?

Michael


michael8 wrote:
 
 Perfect.  This is what I need to know instead of patching 'in the dark'. 
 Good thing SVN revision cuts across all files like a tag.
 
 Thanks Mike!
 
 Michael
 
 
 cambridgemike wrote:
 
 You can see what revision the patch was written for at the top of the
 patch,
 it will look like this:
 
 Index: org/apache/solr/handler/MoreLikeThisHandler.java
 ===
 --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
 +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
 
 now check out revision 772437 using the --revision switch in svn, patch
 away, and then svn up to make sure everything merges cleanly.  This is a
 good guide to follow as well:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html
 
 cheers,
 -mike
 
 On Mon, Nov 2, 2009 at 3:55 PM, michael8 mich...@saracatech.com wrote:
 

 Hi,

 First I like to pardon my novice question on patching solr (1.4).  What
 I
 like to know is, given a patch, like the one for collapse field, how
 would
 one go about knowing what solr source that patch is meant for since this
 is
 a source level patch?  Wouldn't the exact versions of a set of java
 files
 to
 be patched critical for the patch to work properly?

 So far what I have done is to pull the latest collapse field patch down
 from
 http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
 and
 then svn up the latest trunk from
 http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and
 build.
 Intuitively I was thinking I should be doing svn up to a specific
 revision/tag instead of just latest.  So far everything seems fine, but
 I
 just want to make sure I'm doing the right thing and not just being
 lucky.

 Thanks,
 Michael
 --
 View this message in context:
 http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html
Sent from the Solr - User mailing list archive at Nabble.com.



question about collapse.type = adjacent

2009-11-02 Thread michael8

Hi,

I would like to confirm if 'adjacent' in collapse.type means the documents
(with the same collapse field value) are considered adjacent *after* the
'sort' param from the query has been applied, or *before*?  I would think it
would be *after* since collapse feature primarily is meant for presentation
use.

Thanks,
Michael
-- 
View this message in context: 
http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html
Sent from the Solr - User mailing list archive at Nabble.com.



apply a patch on solr

2009-11-02 Thread michael8

Hi,

First I like to pardon my novice question on patching solr (1.4).  What I
like to know is, given a patch, like the one for collapse field, how would
one go about knowing what solr source that patch is meant for since this is
a source level patch?  Wouldn't the exact versions of a set of java files to
be patched critical for the patch to work properly?

So far what I have done is to pull the latest collapse field patch down from
http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and
then svn up the latest trunk from
http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. 
Intuitively I was thinking I should be doing svn up to a specific
revision/tag instead of just latest.  So far everything seems fine, but I
just want to make sure I'm doing the right thing and not just being lucky.

Thanks,
Michael
-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157826p26157826.html
Sent from the Solr - User mailing list archive at Nabble.com.



apply a patch on solr

2009-11-02 Thread michael8

Hi,

First I like to pardon my novice question on patching solr (1.4).  What I
like to know is, given a patch, like the one for collapse field, how would
one go about knowing what solr source that patch is meant for since this is
a source level patch?  Wouldn't the exact versions of a set of java files to
be patched critical for the patch to work properly?

So far what I have done is to pull the latest collapse field patch down from
http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch), and
then svn up the latest trunk from
http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build. 
Intuitively I was thinking I should be doing svn up to a specific
revision/tag instead of just latest.  So far everything seems fine, but I
just want to make sure I'm doing the right thing and not just being lucky.

Thanks,
Michael
-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dih.last_index_time - exacty what time is this capturing?

2009-10-11 Thread michael8

Thanks for your clarification Shalin.  

Given your explanation, would you agree that there is still a small window
(how ever small this may be) where some documents could be missed in the
next delta using dih.last_index_time if the data source adds or updates
documents very frequently?  i.e. the time between the SQL done executing and
data received by Solr to start indexing, some new/updated documents may have
been written in the DB such that the timestamps for those documents are
slightly before the captured last_index_time when indexing starts?

Michael


Shalin Shekhar Mangar wrote:
 
 On Sat, Oct 10, 2009 at 1:42 AM, michael8 mich...@saracatech.com wrote:
 

 Hi,

 Does anyone know when exactly is the dih.last_index_time in
 dataimport.properties captured?  E.g. start of issueing SQL to data
 source,
 end of executing SQL to data source to fetch the list of IDs that have
 changed since last index, end of indexing all changed/new documents?  The
 name seems to imply 'end of indexing all changed/new docs', but i just
 want
 to be sure.


 last_index_time is set to current date/time before the actual indexing is
 started. The rationale is not to miss any documents. If we had set the
 last_index_time after the indexing is completed then we may lose the rows
 inserted/modified after the query of the previous import. In the current
 setup, some documents may get re-imported again but because most users
 have
 a uniqueKey, it is not a big problem.
 
 
 Also, I noticed a discrepancy between the commented time string and the
 actual last_index_time value.  Is the commented time (#) the time the
 file
 was written, vs. the actual last index time?

 #Fri Oct 09 13:01:57 PDT 2009
 item.last_index_time=2009-10-09 12\:58\:10
 last_index_time=2009-10-09 12\:58\:10


 The commented time is the time at which the property file was written.
 This
 is automatically added by Java's Properties class.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25844816.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dih.last_index_time - exacty what time is this capturing?

2009-10-11 Thread michael8

That's perfect.  Reimporting and reindexing some redundantly because of the
slight time overlap is worth the risk of losing docs.  Thanks Shalin.  

Michael


Shalin Shekhar Mangar wrote:
 
 On Sun, Oct 11, 2009 at 9:46 PM, michael8 mich...@saracatech.com wrote:
 

 Thanks for your clarification Shalin.

 Given your explanation, would you agree that there is still a small
 window
 (how ever small this may be) where some documents could be missed in the
 next delta using dih.last_index_time if the data source adds or updates
 documents very frequently?  i.e. the time between the SQL done executing
 and
 data received by Solr to start indexing, some new/updated documents may
 have
 been written in the DB such that the timestamps for those documents are
 slightly before the captured last_index_time when indexing starts?


 The last_index_time is recorded before any SQL queries are fired so I
 don't
 think any rows could be missed. Some could be imported more than once
 though.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25850464.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?

2009-10-09 Thread michael8

Thanks Shalin.  Patch works well for me too.

Michael


Shalin Shekhar Mangar wrote:
 
 On Thu, Oct 8, 2009 at 1:38 AM, michael8 mich...@saracatech.com wrote:
 

 2 things I noticed that are different from 1.3 to 1.4 for DataImport:

 1. there are now 2 datetime values (per my specific schema I'm sure) in
 the
 dataimport.properties vs. only 1 in 1.3 (using the exact same schema). 
 One
 is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named
 item.last_index_time, where 'item' is my main and only entity name
 specified
 in my data-import.xml.  they both have the same value.


 This was added with SOLR-783 to enable delta imports of entities
 individually. One can specify the entity name(s) which should be imported.
 Without this it was not possible to correctly figure out deltas on a
 per-entity basis.
 
 
 2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05
 14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT
 2009',
 with the day of week, name of month, and timezone spelled out.  I had
 issue
 with the 1.4 format with MySQL only for the timezone part, but now I have
 a
 different solution without using this last index date altogether.


 I just committed SOLR-1496 so the different date format issue is fixed in
 trunk.
 
 
 I'm curious though if there's any config setting to pass to
 DataImportHandler to specify the desired date/time format to use.


 There is no configuration to change this. However, you can write your own
 Evaluator to output ${dih.last_index_time} in whatever format you prefer.
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25826421.html
Sent from the Solr - User mailing list archive at Nabble.com.



dih.last_index_time - exacty what time is this capturing?

2009-10-09 Thread michael8

Hi,

Does anyone know when exactly is the dih.last_index_time in
dataimport.properties captured?  E.g. start of issueing SQL to data source,
end of executing SQL to data source to fetch the list of IDs that have
changed since last index, end of indexing all changed/new documents?  The
name seems to imply 'end of indexing all changed/new docs', but i just want
to be sure.  

Also, I noticed a discrepancy between the commented time string and the
actual last_index_time value.  Is the commented time (#) the time the file
was written, vs. the actual last index time?

#Fri Oct 09 13:01:57 PDT 2009
item.last_index_time=2009-10-09 12\:58\:10
last_index_time=2009-10-09 12\:58\:10

Thanks,
Michael

-- 
View this message in context: 
http://www.nabble.com/dih.last_index_time---exacty-what-time-is-this-capturing--tp25827228p25827228.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?

2009-10-07 Thread michael8

2 things I noticed that are different from 1.3 to 1.4 for DataImport:

1. there are now 2 datetime values (per my specific schema I'm sure) in the
dataimport.properties vs. only 1 in 1.3 (using the exact same schema).  One
is 'last_index_time' same as 1.3, and a *new* one (in 1.4) named
item.last_index_time, where 'item' is my main and only entity name specified
in my data-import.xml.  they both have the same value.

2. in 1.3, the datetime passed to SQL used to be, e.g., '2009-10-05
14:08:01', but with 1.4 the format becomes 'Mon Oct 05 14:08:01 PDT 2009',
with the day of week, name of month, and timezone spelled out.  I had issue
with the 1.4 format with MySQL only for the timezone part, but now I have a
different solution without using this last index date altogether.

I'm curious though if there's any config setting to pass to
DataImportHandler to specify the desired date/time format to use.

Michael



Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 really?
 I don't remember that being changed.
 
 what difference do u notice?
 
 On Wed, Oct 7, 2009 at 2:30 AM, michael8 mich...@saracatech.com wrote:

 Just looking for confirmation from others, but it appears that the
 formatting
 of last_index_time from dataimport.properties (using DataImportHandler)
 is
 different in 1.4 vs. that in 1.3.  I was troubleshooting why delta
 imports
 are no longer working for me after moving over to solr 1.4 (10/2 nighly)
 and
 noticed that format is different.

 Michael
 --
 View this message in context:
 http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25793468.html
Sent from the Solr - User mailing list archive at Nabble.com.



solr 1.4 formats last_index_time for SQL differently than 1.3 ?!?

2009-10-06 Thread michael8

Just looking for confirmation from others, but it appears that the formatting
of last_index_time from dataimport.properties (using DataImportHandler) is
different in 1.4 vs. that in 1.3.  I was troubleshooting why delta imports
are no longer working for me after moving over to solr 1.4 (10/2 nighly) and
noticed that format is different.

Michael
-- 
View this message in context: 
http://www.nabble.com/solr-1.4-formats-last_index_time-for-SQL-differently-than-1.3--%21--tp25776496p25776496.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: download pre-release nightly solr 1.4

2009-09-28 Thread michael8



markrmiller wrote:
 
 michael8 wrote:

 markrmiller wrote:
   
 michael8 wrote:
 
 Hi,

 I know Solr 1.4 is going to be released any day now pending Lucene 2.9
 release.  Is there anywhere where one can download a pre-released
 nighly
 build of Solr 1.4 just for getting familiar with new features (e.g.
 field
 collapsing)?

 Thanks,
 Michael
   
   
 You can download nightlies
 here:http://people.apache.org/builds/lucene/solr/nightly/

 field collapsing won't be in 1.4 though. You have to build from svn
 after applying the patch for that.

 -- 
 - Mark

 http://www.lucidimagination.com





 

 Thanks for the info Mark.  If field collapsing is a patch, can I apply
 the
 patch against 1.3 then?  Thanks again.

 Michael

   
 Not likely - it has to apply to the current code. If you can find an old
 patch that works with 1.3 (not sure when the patches for that started),
 its possible.
 But you would be using a very old patch (not sure there is one that
 applies to 1.3 trunk either, but you could check).
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 
 

Thanks again Mark.  I think it's better that I go with patching 1.4 when
it's ready for field collapse feature.

Michael
-- 
View this message in context: 
http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25649529.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: download pre-release nightly solr 1.4

2009-09-25 Thread michael8



markrmiller wrote:
 
 michael8 wrote:
 Hi,

 I know Solr 1.4 is going to be released any day now pending Lucene 2.9
 release.  Is there anywhere where one can download a pre-released nighly
 build of Solr 1.4 just for getting familiar with new features (e.g. field
 collapsing)?

 Thanks,
 Michael
   
 You can download nightlies
 here:http://people.apache.org/builds/lucene/solr/nightly/
 
 field collapsing won't be in 1.4 though. You have to build from svn
 after applying the patch for that.
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 
 

Thanks for the info Mark.  If field collapsing is a patch, can I apply the
patch against 1.3 then?  Thanks again.

Michael

-- 
View this message in context: 
http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25615553.html
Sent from the Solr - User mailing list archive at Nabble.com.



download pre-release nightly solr 1.4

2009-09-24 Thread michael8

Hi,

I know Solr 1.4 is going to be released any day now pending Lucene 2.9
release.  Is there anywhere where one can download a pre-released nighly
build of Solr 1.4 just for getting familiar with new features (e.g. field
collapsing)?

Thanks,
Michael
-- 
View this message in context: 
http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25590281.html
Sent from the Solr - User mailing list archive at Nabble.com.



Looking for suggestion of WordDelimiter filter config and 'ALMA awards'

2009-09-24 Thread michael8

Hi,

I have this situation that I believe is very common but was curious if
anyone knows the right way to go about solving it.  

I have a document with 'ALMA awards' in it.  However, when user searches for
'aLMA awards', it ends up with no results found.  However, when I search for
'alma awards' or 'ALMA awards', the right results came back as expected.  

I immediately went to solr/admin/analysis to see what is going on with
indexing of 'ALMA awards' and query parsing of 'aLMA awards', and looks like
WordDelimiter is the one causing the mismatched.  WordDelimiter, with
splitOnCaseChange=1, will turn my search query 'aLMA awards' into 'a' and
'LMA' and 'awards', which is exactly what splitOnCaseChange does.  In this
type of situation, is there a proper way to handle such a situation whereby
the user simply got the case wrong for the 1st letter, or maybe n letters? 
I like the benefits that WordDelimiter filter w/ splitOnCaseChange provides
me, but I am not sure what is the proper way to solve this situation without
compromising on the other benefits this filter provides.  I also tried
preserveOriginal=1, hoping that aLMA will be preserved and later on became
all lowercase alma via another filter, but with no luck.

P.S.: I am basically using the standard config for 'text' fieldtype for my
default search field. (solr 1.3)

Thanks,
Michael
-- 
View this message in context: 
http://www.nabble.com/Looking-for-suggestion-of-WordDelimiter-filter-config-and-%27ALMA-awards%27-tp25591381p25591381.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: standard requestHandler components

2009-09-12 Thread michael8

Hi Jay,

I got it from reading your response.  I did browse around in solrconfig.xml
but could not find any components configured for 'standard', but didn't
realized that there are 'defaults' hardwired.  Thanks for your quick 
detailed response and also your additional tip on spellcheck config.  You
saved me lots of time on trial--error.

Regards,
Michael


Jay Hill wrote:
 
 RequestHandlers are configured in solrconfig.xml. If no components are
 explicitly declared in the request handler config the the defaults are
 used.
 They are:
 - QueryComponent
 - FacetComponent
 - MoreLikeThisComponent
 - HighlightComponent
 - StatsComponent
 - DebugComponent
 
 If you wanted to have a custom list of components (either omitting
 defaults
 or adding custom) you can specify the components for a handler directly:
 arr name=components
   strquery/str
   strfacet/str
   strmlt/str
   strhighlight/str
   strdebug/str
   strsomeothercomponent/str
 /arr
 
 You can add components before or after the main ones like this:
 arr name=first-components
   strmycomponent/str
 /arr
 
 arr name=last-components
   strmyothercomponent/str
 /arr
 
 and that's how the spell check component can be added:
 arr name=last-components
   strspellcheck/str
 /arr
 
 Note that the a component (except the defaults) must be configured in
 solrconfig.xml with the name used in the str element as well.
 
 Have a look at the solrconfig.xml in the example directory
 (.../example/solr/conf/) for examples on how to set up the spellcheck
 component, and on how the request handlers are configured.
 
 -Jay
 http://www.lucidimagination.com
 
 
 On Fri, Sep 11, 2009 at 3:04 PM, michael8 mich...@saracatech.com wrote:
 

 Hi,

 I have a newbie question about the 'standard' requestHandler in
 solrconfig.xml.  What I like to know is where is the config information
 for
 this requestHandler kept?  When I go to http://localhost:8983/solr/admin,
 I
 see the following info, but am curious where are the supposedly 'chained'
 components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent)
 configured for this requestHandler.  I see timing and process debug
 output
 from these components with debugQuery=true, so somewhere these
 components
 must have been configured for this 'standard' requestHandler.

 name:standard
 class:  org.apache.solr.handler.component.SearchHandler
 version:$Revision: 686274 $
 description:Search using components:

 org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent,
 stats:  handlerStart : 1252703405335
 requests : 3
 errors : 0
 timeouts : 0
 totalTime : 201
 avgTimePerRequest : 67.0
 avgRequestsPerSecond : 0.015179728


 What I like to do from understanding this is to properly integrate
 spellcheck component into the standard requestHandler as suggested in a
 solr
 spellcheck example.

 Thanks for any info in advance.
 Michael
 --
 View this message in context:
 http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25414682.html
Sent from the Solr - User mailing list archive at Nabble.com.



standard requestHandler components

2009-09-11 Thread michael8

Hi, 

I have a newbie question about the 'standard' requestHandler in
solrconfig.xml.  What I like to know is where is the config information for
this requestHandler kept?  When I go to http://localhost:8983/solr/admin, I
see the following info, but am curious where are the supposedly 'chained'
components (e.g. QueryComponent, FacetComponent, MoreLikeThisComponent)
configured for this requestHandler.  I see timing and process debug output
from these components with debugQuery=true, so somewhere these components
must have been configured for this 'standard' requestHandler.  

name:standard  
class:  org.apache.solr.handler.component.SearchHandler  
version:$Revision: 686274 $  
description:Search using components:
org.apache.solr.handler.component.QueryComponent,org.apache.solr.handler.component.FacetComponent,org.apache.solr.handler.component.MoreLikeThisComponent,org.apache.solr.handler.component.HighlightComponent,org.apache.solr.handler.component.DebugComponent,
  
stats:  handlerStart : 1252703405335
requests : 3
errors : 0
timeouts : 0
totalTime : 201
avgTimePerRequest : 67.0
avgRequestsPerSecond : 0.015179728 


What I like to do from understanding this is to properly integrate
spellcheck component into the standard requestHandler as suggested in a solr
spellcheck example.  

Thanks for any info in advance.
Michael
-- 
View this message in context: 
http://www.nabble.com/%22standard%22-requestHandler-components-tp25409075p25409075.html
Sent from the Solr - User mailing list archive at Nabble.com.