from:"blargy"

Stemming

2010-07-20 Thread Blargy


I am using the LucidKStemmer and I noticed that it doesnt stem certain
words... for example bags. How could I create a list of explicit words to
stem... ie sort of the opposite of protected words.

I know this can be accomplished using the synonyms file but I want to know
how to just replace one word with another. 

This is a bags test = This is a bag test
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-tp982690p982690.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming

2010-07-20 Thread Blargy


Perfect!

Is there an associated JIRA ticket/patch for this so I can patch my 4.1
build?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-tp982690p982786.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Foreign characters question

2010-07-14 Thread Blargy


Thanks for the reply but that didnt help. 

Tomcat is accepting foreign characters but for some reason when it reads the
synonyms file and it encounters that character ñ it doesnt appear correctly
in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
will work but the synonyms file is srcrewy.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

2010-07-14 Thread Blargy


How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct
solr that this file is UTF-8?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

2010-07-14 Thread Blargy


Nevermind. Apparently my IDE (Netbeans) was set to No encoding... wtf.
Changed it to UTF-8 and recreated the file and all is good now. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html
Sent from the Solr - User mailing list archive at Nabble.com.

Foreign characters question

2010-07-13 Thread Blargy


I am trying to add the following synonym while indexing/searching

swimsuit, bañadores, bañador

I testing searching for bañadores however it didn't return any results.
After further inspection I noticed in the field analysis admin that swimsuit
gets expanded to ba�adores. Not sure if it will show up but the n is a
black diamond with a white question mark in it. 

So basically, how can I add support for foreign characters?  Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html
Sent from the Solr - User mailing list archive at Nabble.com.

MLT with boost capability

2010-07-09 Thread Blargy


I've asked this question in the past without too much success. I figured I
would try to revive it.

Is there a way I can incorporate boost functions with a MoreLikeThis search?
Can it be accomplished at the MLT request handler level or would I need to
create a custom request handler which in turn delegates the majority of the
search to a specialized instance of MLT? Can someone point me in the right
direction?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-with-boost-capability-tp954650p954650.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom PhraseQuery

2010-07-09 Thread Blargy


Oh.. i didnt know about the different signatures to tf. Thanks for that
clarification.

It sounds like all I need to do is actually override tf(float) in the
SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does.
Is this correct?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p955257.html
Sent from the Solr - User mailing list archive at Nabble.com.

ValueSource/Function questions

2010-07-01 Thread Blargy


Can someone explain what the createWeight methods should do?

And one someone mind explaining what the hashCode method is doing in this
use case?

  public int hashCode() {
int h = a.hashCode();
h ^= (h  13) | (h  20);
h += b.hashCode();
h ^= (h  23) | (h  10);
h += name().hashCode();
return h;
  }
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ValueSource-Function-questions-tp936672p936672.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom PhraseQuery

2010-06-29 Thread Blargy


Is there anyway to override/change up the default PhraseQuery class that is
used... similar to how you can change out the Similarity class?

Let me explain what I am trying to do. I would like to override the TF is
calculated... always returning a max of 1 for phraseFreq. 

For example:
Query: foo bar
Doc1: foo bar baz
Doc2: foo bar foo bar

These two documents should be scored exactly the same. I accomplished the
above in the normal query use-case by using the SweetSpotSimilarity class.
There doesn't happen to be a SweetSpotPhraseQuery class is there?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p932414.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy



iorixxx wrote:
 
 it is in schema.xml:
 
 similarity class=org.apache.lucene.search.SweetSpotSimilarity/
 

How would you configure the tfBaselineTfFactors and LengthNormFactors when
configuring via schema.xml? Do I have to create a subclass that hardcodes
these values?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p928730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SweetSpotSimilarity

2010-06-28 Thread Blargy



iorixxx wrote:
 
 CustomSimilarityFactory that extends
 org.apache.solr.schema.SimilarityFactory should do it. There is an example
 CustomSimilarityFactory.java under src/test/org...
 

This is exactly what I was looking for... this is very similar ( no put
intended ;) ) to the updateProcessorFactory configuration in
solr-config.xml. The wiki should probably include this information.

Side question. How would I know if a configuration option can also take a
factory class.. like in this instance?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p928862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Optimizing cache

2010-06-28 Thread Blargy


Here is a screen shot for our cache from New Relic.

http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png

Query cache: 55-65%
Filter cache: 100%
Document cache: 63%

Cache size is 512 for above 3 caches.

How do I interpret this data? What are some optimal configuration changes
given the above stats?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cache-tp929156p929156.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SweetSpotSimilarity

2010-06-25 Thread Blargy



iorixxx wrote:
 
 it is in schema.xml:
 
 similarity class=org.apache.lucene.search.SweetSpotSimilarity/
 

Thanks. Im guessing this is all or nothing.. ie you can't you one similarity
class for one request handler and another for a separate request handler. Is
that correct?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p922622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Similarity

2010-06-24 Thread Blargy


Can someone explain how I can override the default behavior of the tf
contributing a higher score for documents with repeated words?

For example:

Query: foo
Doc1: foo bar score 1.0
Doc2: foo foo bar score 1.1

Doc2 contains foo twice so it is scored higher. How can I override this
behavior?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920366.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Similarity

2010-06-24 Thread Blargy



Yonik Seeley-2-2 wrote:
 
 Depends on the larger context of what you are trying to do.
 Do you still want the idf and length norm relevancy factors?  If not,
 use a filter, or boost the particular clause with 0.
 

I do want the other relevancy factors.. ie boost, phrase-boosting etc but I
just want to make it so that only unique terms in the query contribute to
the overall score.

For example:

Query: foo
Doc1: foo bar baz
Doc2: foo foo bar

The above documents should have the same score.

Query foo baz
Doc1: foo bar baz
Doc2: foo foo bar

In this example Doc1 should be scored higher because it has 2 unique terms
that match


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: anyone use hadoop+solr?

2010-06-22 Thread Blargy


Need, 

Seems like we are in the same boat. Our index consist of 5M records which
roughly equals around 30 gigs. All in all thats not too bad however our
indexing process (we use DIH but I'm now revisiting that idea) takes a
whopping 30+ hours!!!

I just bought the Hadoop In Action early edition but haven't had time to
read it yet. I was wondering what resources you are using to learn Hadoop
and more importantly its applications to Solr. Would you mind explaining
your thought process on how you will be using Hadoop in more detail? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914606.html
Sent from the Solr - User mailing list archive at Nabble.com.

LocalParams?

2010-06-21 Thread Blargy


Huh? Read through the wiki: See http://wiki.apache.org/solr/LocalParams but I
still don't understand its utility? 

Can someone explain to me why this would even be used? Any examples to help
clarify? Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalParams-tp913183p913183.html
Sent from the Solr - User mailing list archive at Nabble.com.

jdbc4.CommunicationsException

2010-06-20 Thread Blargy


Does anyone know a solution to this problem? I've already tried
autoReconnect=true and it doesn't appear to help. This happened 34 hours
into my full-import... ouch! 

org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet
successfully received from the server was 21 milliseconds ago.  The last
packet sent successfully to the server was 124,896,004 milliseconds ago. is
longer than the server configured value of 'wait_timeout'. You should
consider either expiring and/or testing connection validity before use in
your application, increasing the server configured values for client
timeouts, or using the Connector/J connection property 'autoReconnect=true'
to avoid this problem.
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:339)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:228)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:78)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:361)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:246)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/jdbc4-CommunicationsException-tp909274p909274.html
Sent from the Solr - User mailing list archive at Nabble.com.

IDH - Total Documents Processed is missing

2010-06-20 Thread Blargy


It seems that when importing via DIH the Total Documents Processed status
message does not appear when there are two entities for a given document. Is
this by design?

 document
   entity name=one/
   entity name=two/
 /document

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/IDH-Total-Documents-Processed-is-missing-tp909325p909325.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-18 Thread Blargy



Otis Gospodnetic-2 wrote:
 
 Smaller merge factor will make things worse - 
 

- Whoops... Ill guess Ill change it from 5 to the default 10
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-18 Thread Blargy



Otis Gospodnetic-2 wrote:
 
 You may want to try the RPM tool, it will show you what inside of that
 QueryComponent is really slow.
 

We are already using it :)

Where should I be concentrating on? Transaction trace?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


How can I preserve phrases for either autosuggest/autocomplete/spellcheck?

For example we have a bunch of product listings and I want if someone types:
louis for it to common up with Louis Vuitton. World ... World cup. 

Would I need n-grams? Shingling? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p902951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


Thanks for the reply Michael. Ill definitely try that out and let you know
how it goes. Your solution sounds similar to the one I've read here:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
 

There are some good comments in there too.

I think I am having the biggest trouble distinguishing what needs to be done
for autocomplete/autosuggestion (google like behavior) and a separate issue
involving spellchecking (Did you mean...). I guess I originally thought
those 2 distinct features would involve the same solution but it appears
that they are completely different. Your solution sounds like its works best
for autocomplete and I will be using it for that exact purpose ;) One
question though... how do you handle more popular words/documents over
others? 

Now my next question is, how would I get spellchecker to work with phrases.
So if I typed vitton it would come back with something like: Did you
mean: 'Louis Vuitton'? Will this also require a combination of ngrams and
shingles? 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903225.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy


Ok that makes perfect sense.

What I did was use a combination of the two running the indexed terms
through  - I initially read this as you used your current index and use
the terms from that to buildup your dictionary.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903299.html
Sent from the Solr - User mailing list archive at Nabble.com.

DismaxRequestHandler

2010-06-17 Thread Blargy


I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p903641.html
Sent from the Solr - User mailing list archive at Nabble.com.

defType=Dismax questions

2010-06-17 Thread Blargy


Sorry for the repost but I posted under DismaxRequestHandler when I should
have listed it as DismaxQueryParser.. ie im using defType=dismax

I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/defType-Dismax-questions-tp904087p904087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Peformance tuning

2010-06-17 Thread Blargy


After indexing our item descriptions our index grew from around 3gigs to now
17.5 and I can see our search has deteriorated from sub 50ms searches to
over 500ms now. The sick thing is I'm not even searching across that field
at the moment but I plan to in the near future as well as include
highlighting.

What size is considered to be too big for one index? When should one
looking into sharding/federation etc?

What are some generic performance tuning options that could possible help?
We are currently hosting 4 slaves. Would increasing the number of slaves
help?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904540.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Blargy


Is there an alternative for highlighting on a large stored field? I thought
for highlighting you needed the field stored? I really just need the
excerpting feature for highlighting relevant portions of our item
descriptions.

Not sure if this is because of the index size (17.5G) or because of
highlighting but our slave servers are experiencing high loads... possibly
due to replication That actually leads me to my next question, I thought
replication would only download new segments without the need to always
re-download the whole index. This doesn't appear to be the case from what
I'm seeing. Am I wrong?

Thanks again

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904610.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Peformance tuning

2010-06-17 Thread Blargy



Blargy - Please try to quote the mail you're responding to, at least  
 the relevant piece.  It's nice to see some context to the discussion.

No problem ;)


Depends - if you optimize the index on the master, then the entire index is
replicated.  If you simply commit and let Lucene take care of  adding
segments you'll generally reduce what is replicated. 

As a side question... would reducing the mergeFactor help at all? This is
currently what I am using...

mainIndex
useCompoundFilefalse/useCompoundFile
ramBufferSizeMB64/ramBufferSizeMB
mergeFactor5/mergeFactor
unlockOnStartupfalse/unlockOnStartup
reopenReaderstrue/reopenReaders

deletionPolicy class=solr.SolrDeletionPolicy
  str name=maxCommitsToKeep1/str
  str name=maxOptimizedCommitsToKeep0/str
/deletionPolicy

infoStream file=INFOSTREAM.txtfalse/infoStream
  /mainIndex
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904810.html
Sent from the Solr - User mailing list archive at Nabble.com.

SpellCheckComponent questions

2010-06-16 Thread Blargy


Is it generally wiser to build the dictionary from the existing index? Search
Log? Other? 

For Did you mean does one usually just use collate=true and then return
that string?

Should I be using a separate spellchecker handler to should I just always
include spellcheck=true in my original search queries? I noticed in some
sample solrconfig files that it recommends against creating a separate
request handler just for spellcheck requests but I why should I tax every
single request when I really only want to perform a spellcheck when there
are less than x amount of results.

I'm guessing if I wanted to achieve the above functionality (only spellcheck
when there are  x results) I could create a custom SearchComponent that
subclasses the solr.SpellCheckComponent. If I decide to go down this route,
how can I get access to the number or results/and or actual results?

Thanks again nabble ;)



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-questions-tp901672p901672.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SpellCheckComponent questions

2010-06-16 Thread Blargy


Follow up question.

How can I influence the scoring of results that comeback either through
term frequency (if i build of an index) or through # of search results
returned (if using a search log)?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-questions-tp901672p901789.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to apply patch SOLR-1316

2010-06-16 Thread Blargy


Im trying to apply this via the command line patch -p0  SOLR-1316.patch.

When patching against trunk I get the following errors.

~/workspace $ patch -p0  SOLR-1316.patch 
patching file
dev/trunk/solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #2 succeeded at 575 (offset -3 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/IndexBasedSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/SolrSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/BufferingTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/FileDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Lookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/SortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Suggester.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/UnsortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTAutocomplete.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TernaryTreeNode.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java
Hunk #1 FAILED at 54.
Hunk #2 FAILED at 69.
2 out of 2 hunks FAILED -- saving rejects to file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java.rej
patching file
dev/trunk/solr/src/java/org/apache/solr/util/SortedIterator.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/TermFreqIterator.java
patching file
dev/trunk/solr/src/test/org/apache/solr/spelling/suggest/SuggesterTest.java
patching file
dev/trunk/solr/src/test/test-files/solr/conf/schema-spellchecker.xml
patching file
dev/trunk/solr/src/test/test-files/solr/conf/solrconfig-spellchecker.xml

Patching against the 1.4.0 tag I get the following errors

$ patch -p0  SOLR-1316.patch 
patching file
dev/trunk/solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #1 succeeded at 102 (offset -5 lines).
Hunk #2 succeeded at 348 (offset -230 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java
Hunk #1 succeeded at 40 (offset 1 line).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/IndexBasedSpellChecker.java
Hunk #1 succeeded at 105 (offset 3 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/SolrSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/BufferingTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/FileDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Lookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/SortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Suggester.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/UnsortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTAutocomplete.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TernaryTreeNode.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/SortedIterator.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/TermFreqIterator.java
patching file
dev/trunk/solr/src/test/org/apache/solr/spelling/suggest/SuggesterTest.java
patching file
dev/trunk/solr/src/test/test-files/solr/conf/schema-spellchecker.xml
patching file
dev/trunk/solr/src/test/test-files/solr/conf/solrconfig-spellchecker.xml
Hunk #1 succeeded at 86 with fuzz 1 (offset -6 lines).

As you can see both versions don't appear to be working. I tried building
each but neither would compile. Which version/tag should be used when
applying this patch?

Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-apply-patch-SOLR-1316-tp676497p901887.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom faceting question

2010-06-15 Thread Blargy


Got it. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p897390.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCoreAware

2010-06-15 Thread Blargy


Can someone please explain what the inform method should accomplish? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCoreAware-tp899064p899064.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrEventListener

2010-06-15 Thread Blargy


Can someone explain how to register a SolrEventListener? 

I am actually interested in using the SpellCheckerListener and it appears
that it would build/rebuild a spellchecker index on commit and/or optimize
but according to the wiki the only events that can be listened for are
firstSearcher  and newSearcher
(http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) Is the wiki
outdated or something?

So how can I register this (or any other event listener) to execute on
commit/optimize? Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrEventListener-tp899074p899074.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom faceting question

2010-06-14 Thread Blargy


: ...you've already got the conceptual model of how to do it, all you need
: now is to implement it as a Component that does the secondary-faceting in
: the same requests (which should definitley be more efficient since you can
: reuse the DocSets) instead of issuing secondary requets from your client

Couldn't I just create a custom search handler to do this so it all the
logic resides on the server side? I'm guessing I would need to subclass
SearchHandler and override handleRequestBody.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p895990.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing HTML

2010-06-10 Thread Blargy


Do I even need to tidy/clean up the html if I use the
HTMLStripCharFilterFactory?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p885797.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexing HTML

2010-06-09 Thread Blargy


What is the preferred way to index html using DIH (my html is stored in a
blob field in our database)? 

I know there is the built in HTMLStripTransformer but that doesn't seem to
work well with malformed/incomplete HTML. I've created a custom transformer
to first tidy up the html using JTidy then I pass it to the
HTMLStripTransformer like so:

field column=description name=description tidy=true
ignoreErrors=true propertiesFile=config/tidy.properties/
field column=description name=description stripHTML=true/

However this method isn't fool-proof as you can see by my ignoreErrors
option. 

I quickly took a peek at Tika and I noticed that it has its own HtmlParser.
Is this something I should look into? Are there any alternatives that deal
with malformed/incomplete  html? Thanks






-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884497.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing HTML

2010-06-09 Thread Blargy


Does the HTMLStripChar apply at index time or query time? Would it matter to
use over the other?

As a side question, if I want to perform highlighter summaries against this
field do I need to store the whole field or just index it with
TermVector.WITH_POSITIONS_OFFSETS? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884579.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing HTML

2010-06-09 Thread Blargy


Wait... do you mean I should try the HTMLStripCharFilterFactory analyzer at
index time?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p884592.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom faceting question

2010-06-03 Thread Blargy


I believe I'll need to write some custom code to accomplish what I want
(efficiently that is) but I'm unsure of what would be the best route to
take. Will this require a custom request handler? Search component? 

Ok the easiest way to explain is to show you what I want.
http://shop.ebay.com/?_from=R40_trksid=p3907.m570.l1313_nkw=fashion_sacat=See-All-Categories.

We have a similar category structure whereas we have top level-categories
and then sub-categories. I want to be able to perform a search and then only
return the top 3 top-level categories with their sub-categories also
faceted. The problem is I don't know what those top 3 top-level categories
are until after I search.

The dumb easy way: Facet on all top-level categories and sub-categories.
This results in faceting on over 600 categories... probably not the best
route. 

Second way. Have the client send multiple requests on the backend. First to
determine the top 3 categories, then another for all the subcategories. This
involves more client side coding and I would prefer not to perform 2x the
requests. If at all possible I would like to do this on the Solr side.

Just to mention, sending multiple requests via ajax won't work because we
need the content on the page at render time.

Any suggestions, pointers? Thanks




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p868015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy



As a data point, I routinely see clients index 5M items on normal
 hardware in approx. 1 hour (give or take 30 minutes).  

Our master solr machine is running 64-bit RHEL 5.4 on dedicated machine with
4 cores and 16G ram so I think we are good on the hardware. Our DB is MySQL
version 5.0.67 (exact stats i don't know of the top of my head)


When you say quite large, what do you mean?  Are we talking books here or
maybe a couple pages of text or just a couple KB of data?

Our item descriptions are very similar to an ebay listing and can include
HTML. We are talking about a couple of pages of text.


How long does it take you to get that data out (and, from the sounds of it,
merge it with your item) w/o going to Solr? 

I'll have to get back to you on that one.


DataImportHandler now supports multiple threads. 

When you say now, what do you mean? I am running version 1.4.


The absolute fastest way that I know of to index is via multiple threads
sending batches of documents at a time (at least 100)

 Is there a wiki explaining how this multiple thread process works? Which
batch size would work best? I am currently using a -1 batch size. 


You may want to write your own multithreaded client to index. 

This sounds like a viable option. Can you point me in the right direction on
where to begin (what classes to look at, prior examples, etc)?

Here is my field type I am using for the item description. Maybe its not the
best?

  fieldType name=text class=solr.TextField omitNorms=false
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumber=1
catenateAll=1
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Here is an overview of my data-config.xml. Thoughts?

 entity name=item 
dataSource=datasource1
query=select * from items
 ...
entity name=item_description 
dataSource=datasource2 
query=select description from item_descriptions where
id=${item.id}/
 /entity

I appreciate the help.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy



Andrzej Bialecki wrote:
 
 On 2010-06-02 12:42, Grant Ingersoll wrote:
 
 On Jun 1, 2010, at 9:54 PM, Blargy wrote:
 

 We have around 5 million items in our index and each item has a
 description
 located on a separate physical database. These item descriptions vary in
 size and for the most part are quite large. Currently we are only
 indexing
 items and not their corresponding description and a full import takes
 around
 4 hours. Ideally we want to index both our items and their descriptions
 but
 after some quick profiling I determined that a full import would take in
 excess of 24 hours. 

 - How would I profile the indexing process to determine if the
 bottleneck is
 Solr or our Database.
 
 As a data point, I routinely see clients index 5M items on normal
 hardware in approx. 1 hour (give or take 30 minutes).  
 
 When you say quite large, what do you mean?  Are we talking books here
 or maybe a couple pages of text or just a couple KB of data?
 
 How long does it take you to get that data out (and, from the sounds of
 it, merge it with your item) w/o going to Solr?
 
 - In either case, how would one speed up this process? Is there a way to
 run
 parallel import processes and then merge them together at the end?
 Possibly
 use some sort of distributed computing?
 
 DataImportHandler now supports multiple threads.  The absolute fastest
 way that I know of to index is via multiple threads sending batches of
 documents at a time (at least 100).  Often, from DBs one can split up the
 table via SQL statements that can then be fetched separately.  You may
 want to write your own multithreaded client to index.
 
 SOLR-1301 is also an option if you are familiar with Hadoop ...
 
 
 
 -- 
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com
 
 
 

I haven't worked with Hadoop before but I'm willing to try anything to cut
down this full import time. I see this currently uses the embedded solr
server for indexing... would I have to scrap my DIH importing then? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy



As a data point, I routinely see clients index 5M items on normal hardware
in approx. 1 hour (give or take 30 minutes). 

Also wanted to add that our main entity (item) consists of 5 sub-entities
(ie, joins). 2 of those 5 are fairly small so I am using
CachedSqlEntityProcessor for them but the other 3 (which includes
item_description) are normal.

All the entites minus the item_description connect to datasource1. They
currently point to one physical machine although we do have a pool of 3 DB's
that could be used if it helps. The other entity, item_description uses a
datasource2 which has a pool of 2 DB's that could potentially be used. Not
sure if that would help or not.

I might as well that the item description will have indexed, stored and term
vectors set to true.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy




 One thing that might help indexing speed - create a *single* SQL query  
 to grab all the data you need without using DIH's sub-entities, at  
 least the non-cached ones.
 

Not sure how much that would help. As I mentioned that without the item
description import the full process takes 4 hours which is bearable. However
once I started to import the item description which is located on a separate
machine/database the import process exploded to over 24 hours.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy



Lance Norskog-2 wrote:
 
 Wait! You're fetching records from one database and then doing lookups
 against another DB? That makes this a completely different problem.
 
 The DIH does not to my knowledge have the ability to pool these
 queries. That is, it will not build a batch of 1000 keys from
 datasource1 and then do a query against datasource2 with:
 select foo where key_field IN (key1, key2,... key1000);
 
 This is the efficient way to do what you want. You'll have to write
 your own client to do this.
 
 On Wed, Jun 2, 2010 at 12:00 PM, David Stuart
 david.stu...@progressivealliance.co.uk wrote:
 How long does it take to do a grab of all the data via SQL? I found by
 denormalizing the data into a lookup table meant that I was able to index
 about 300k rows of similar data size with dih regex spilting on some
 fields
 in about 8mins I know it's not quite the scale bit with batching...

 David Stuar

 On 2 Jun 2010, at 17:58, Blargy zman...@hotmail.com wrote:




 One thing that might help indexing speed - create a *single* SQL query
 to grab all the data you need without using DIH's sub-entities, at
 least the non-cached ones.


 Not sure how much that would help. As I mentioned that without the item
 description import the full process takes 4 hours which is bearable.
 However
 once I started to import the item description which is located on a
 separate
 machine/database the import process exploded to over 24 hours.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865324.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 Lance Norskog
 goks...@gmail.com
 

Whats more efficient a batch size of 1000 or -1 for MySQL? Is this why its
so slow because I am using 2 different datasources?

Say I am using just one datasource should I still be seing Creating a
connection for entity  for each sub entity in the document or should it
just be using one connection?




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy



Erik Hatcher-4 wrote:
 
 One thing that might help indexing speed - create a *single* SQL query  
 to grab all the data you need without using DIH's sub-entities, at  
 least the non-cached ones.
 
   Erik
 
 On Jun 2, 2010, at 12:21 PM, Blargy wrote:
 


 As a data point, I routinely see clients index 5M items on normal  
 hardware
 in approx. 1 hour (give or take 30 minutes).

 Also wanted to add that our main entity (item) consists of 5 sub- 
 entities
 (ie, joins). 2 of those 5 are fairly small so I am using
 CachedSqlEntityProcessor for them but the other 3 (which includes
 item_description) are normal.

 All the entites minus the item_description connect to datasource1.  
 They
 currently point to one physical machine although we do have a pool  
 of 3 DB's
 that could be used if it helps. The other entity, item_description  
 uses a
 datasource2 which has a pool of 2 DB's that could potentially be  
 used. Not
 sure if that would help or not.

 I might as well that the item description will have indexed, stored  
 and term
 vectors set to true.
 -- 
 View this message in context:
 http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

I can't find any example of creating a massive sql query. Any out there?
Will batching still work with this massive query?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866506.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Importing large datasets

2010-06-02 Thread Blargy


Would dumping the databases to a local file help at all?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p866538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Subclassing DIH

2010-06-01 Thread Blargy


I'll give the deletedEntity trick a try... igneous 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p863108.html
Sent from the Solr - User mailing list archive at Nabble.com.

Importing large datasets

2010-06-01 Thread Blargy


We have around 5 million items in our index and each item has a description
located on a separate physical database. These item descriptions vary in
size and for the most part are quite large. Currently we are only indexing
items and not their corresponding description and a full import takes around
4 hours. Ideally we want to index both our items and their descriptions but
after some quick profiling I determined that a full import would take in
excess of 24 hours. 

- How would I profile the indexing process to determine if the bottleneck is
Solr or our Database.
- In either case, how would one speed up this process? Is there a way to run
parallel import processes and then merge them together at the end? Possibly
use some sort of distributed computing?

Any ideas. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p863447.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sort by function workaround for Solr 1.4

2010-05-28 Thread Blargy


How would this be any different than simply using the function to alter the
scoring of the final results and then sorting by score?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Sort-by-function-workaround-for-Solr-1-4-tp851922p852471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need guidance on schema type

2010-05-27 Thread Blargy


There will never be any need to search the actual HTML (tags, markup, etc) so
as far as functionality goes it seems like the DIH HTMLStripTransformer is
the way to go.

Are there any significant performance differences between the two?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-guidance-on-schema-type-tp846923p848874.html
Sent from the Solr - User mailing list archive at Nabble.com.

Generic questions

2010-05-27 Thread Blargy


Can someone explain to be what the state of Solr/Lucene is... didn't they
recently combine?

I know I am running version 1.4 but I keep seeing version numbers out there
that are 3.0, 4.0??? Can someone explain what that means.

Also is the state of trunk (1.4 or 4.0??) good enough for production use?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Generic-questions-tp848917p848917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Generic questions

2010-05-27 Thread Blargy



Yonik Seeley-2-2 wrote:
 
 Lots of other stuff has changed.  For example, trunk is now always the
 next *major* version number.
 So the trunk of the combined lucene/solr is 4.0-dev
 
 There is now a branch_3x that is like trunk for all future 3.x releases.
 
 The next version of Solr will probably be 3.1, and it's unlikely there
 will ever be a 1.5 released.
 

Wait.. what? Now, im more confused 

What version is (http://svn.apache.org/repos/asf/lucene/dev/trunk/)? Im
guessing its 4.0-dev but then where does 3.1 fit in?

Say I am running 1.4 and want to upgrade, which version should I use? If I
want to use a patch that has a fix version of 1.5 which should I be using?
(https://issues.apache.org/jira/browse/SOLR-1316). 

Thanks again


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Generic-questions-tp848917p849161.html
Sent from the Solr - User mailing list archive at Nabble.com.

Highlighting questions

2010-05-26 Thread Blargy


What are the correct for settings to get highlighting excerpting working?

Original Text: The quick brown fox jumps over the lazy dog
Query: jump
Result:  fox jumps over 

Can you do something like the above with the highlighter or can it only
surround matches with pre and post tags? Can someone explain what
mergeContinuous does? 

Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-questions-tp846628p846628.html
Sent from the Solr - User mailing list archive at Nabble.com.

Snapshooter question

2010-05-23 Thread Blargy


Is it possible to limit the number of snapshots taken by the replication
handler? ...http://localhost:8983/solr/replication?command=backup

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Snapshooter-question-tp838914p838914.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH post import event listener for errors

2010-05-22 Thread Blargy


Awesome thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p836955.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH post import event listener for errors

2010-05-22 Thread Blargy


Smiley, I dont follow. Can you explain how one could do this?

I'm guessing Log4J would parse the logs looking for a ROLLBACK and then it
would send out a notification? Sorry but i'm not really familiar with Log4J

BTW, loved your book. Have you've thought about putting out another more
advanced book possibly covering subjects as, custom request handlers,
plugins etc?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p836974.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH post import event listener for errors

2010-05-22 Thread Blargy


Ok... just read up on Log4J email notification. Sounds like it would be a
good idea however can you have separate SMTPAppenders based on which
exception is thrown and/or by searching for a particular string?

ie, if log level = SEVERE and contains rollback then use SMTPAppender foo.

Thanks
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p837015.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-22 Thread Blargy


Narrowed down the issues to this block in in DocBuilder.java in the
collectDelta method. Any ideas?

 SetMapString, Object deletedSet = new HashSetMapString, Object();
SetMapString, Object deltaRemoveSet = new HashSetMapString,
Object();
while (true) {
  MapString, Object row = entityProcessor.nextDeletedRowKey();

  if (row == null)
break;
  
  //Check to see if this delete is in the current delta set
  for (MapString, Object modifiedRow : deltaSet) {
if (modifiedRow.get(entity.getPk()).equals(row.get(entity.getPk(
{
  deltaRemoveSet.add(modifiedRow);
}
  }

  deletedSet.add(row);
  importStatistics.rowsCount.incrementAndGet();
  // check for abort
  if (stop.get())
return new HashSet();
}
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p837444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-22 Thread Blargy


Forgot to mention, the entity that is causing this is the root entity
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p837451.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Subclassing DIH

2010-05-20 Thread Blargy

Ok to further explain myself.

Well first off I was experience a StackOverFlow error during my
delta-imports after doing a full-import. The strange thing was, it only
happened sometimes. Thread is here:
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780

I never did find a good solution to that bug however I did come up with a
workaround. I noticed if I removed my deletedPkQuery then the delta-import
would work as expected. Obviously I still have the need to delete items out
of the index during indexing so I wanted to subclass the DataImportHandler
to first update all documents then I would delete all the documents that my
deletedPkQuery would have deleted.

I can actually accomplish the above behavior using the onImportEnd
EventListener however I lose the ability to know how many documents were
actually deleted since my manual deletion of documents doesnt get pick up in
the data importer cumulativeStatistics.

My hope was that I could subclass DIH and massage the cumulativeStatistics
after my manual deletion of documents.

FYI my manual deletion is accomplished by sending a deleteById query to an
instance of CommonsHttpSolrServer that I create from the current context of
the EventListener. Side question: How can I retrieve the # of items actually
removed from the index after a deletedById query???

Thoughts on the process? There just has to be an easier way.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p832684.html
Sent from the Solr - User mailing list archive at Nabble.com.

Subclassing DIH

2010-05-19 Thread Blargy


I am trying to subclass DIH to add I am having a hard time trying to get
access to the current Solr Context. How is this possible? 

Is there anyway to get access to the current DataSource, DataImporter etc?

On a related note... when working with an onImportEnd, or onImportStart how
can I get a reference to the current Request/Response that initiated the
import? 

From the DIH subclass I can access the request/response but not the context.
From the event listener I can access the Context but not the
request/response. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p830954.html
Sent from the Solr - User mailing list archive at Nabble.com.

DataImporter from context

2010-05-18 Thread Blargy


Whats the best way to get to the instance of DataImport handler from the
current context?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImporter-from-context-tp825517p825517.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autosuggest

2010-05-18 Thread Blargy


Thanks for the info Hoss.

I will probably need to go with one of the more complicated solutions. Is
there any online documentation for this task? Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p827329.html
Sent from the Solr - User mailing list archive at Nabble.com.

Deduplication

2010-05-18 Thread Blargy


Basically for some uses cases I would like to show duplicates for other I
wanted them ignored.

If I have overwriteDupes=false and I just create the dedup hash how can I
query for only unique hash values... ie something like a SQL group by. 

Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Deduplication-tp828016p828016.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-17 Thread Blargy


Is there anymore information I can post so someone can give me a clue on
whats happening? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824516.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-17 Thread Blargy


I just found out if I remove my deletedPkQuery then the import will work. Is
it possible that the there is some conflict between my delta indexing and my
delta deleting?

Any suggestions?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824780.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autosuggest

2010-05-15 Thread Blargy


Andrzej is this ready for production usage?

Hopefully in the future we can include user click through rates to boost
those terms/phrases higher
 - This could be huge!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819762.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autosuggest

2010-05-15 Thread Blargy


Maybe I should have phrased it as: Is this ready to be used with Solr 1.4?

Also, as Grang asked in the thread, what is the actual status of that patch?
Thanks again!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819765.html
Sent from the Solr - User mailing list archive at Nabble.com.

Recommended MySQL JDBC driver

2010-05-14 Thread Blargy


Which driver is the best for use with solr?

I am currently using mysql-connector-java-5.1.12-bin.jar in my production
setting. However I recently tried downgrading and did some quick indexing
using mysql-connector-java-5.0.8-bin.jar and I close to a 2x improvement in
speed!!! Unfortunately I kept getting the following error using the 5.0.8
version:

Caused by: com.mysql.jdbc.CommunicationsException: The last communications
with the server was 474 seconds ago, which  is longer than the server
configured value of 'wait_timeout'. You should consider either expiring
and/or testing connection validity before use in your application,
increasing the server configured values for client timeouts, or using the
Connector/J connection property 'autoReconnect=true' to avoid this problem.

I tried setting the autoReconnect=true in my datasource configuration but
I keep getting the same error. Any ideas?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p817458.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recommended MySQL JDBC driver

2010-05-14 Thread Blargy


Shawn, first off thanks for the reply and links!

As far as the error in the 5.0.8 version, does the import work, or does it
fail when the exception is thrown?
- The import works for about 5-10 minutes then it fails and everything is
rolled-back one the above exception is thrown.

 You might also try doing as it says and increasing the timeout on the
server
- How is this accomplished? I tried maxWait options on the datasource in
data-config.xml but that didn't seem to work.

I'm also torn on whether or not I should file a bug that may or not exist.
The whole reason I tried downgrading to 5.0.8 was due to the fact that
during certain (not all) delta-imports I keep getting the following error
which seems to be all mysql related: 

SEVERE: Delta Import Failed
java.lang.StackOverflowError
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3296)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1941)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2690)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
 and it keeps going

Once the above exception occurs I can never delta-import again against that
index. I am then forced to do a full-import. Do you have any thoughts or
suggestions on that? Should I file this as a MySQL bug?

Thanks again for your help. I'll try playing around with the latest versions
of the connector and I'll post my results.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p817790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recommended MySQL JDBC driver

2010-05-14 Thread Blargy


Lucas.. was there a reason you went with 5.1.10 or was it just the latest
when you started your Solr project?

Also, how many items are in your index and how big is your index size?
Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Recommended-MySQL-JDBC-driver-tp817458p817855.html
Sent from the Solr - User mailing list archive at Nabble.com.

Autosuggest

2010-05-14 Thread Blargy


What is the preferred way to implement this feature? Using facets or the
terms component (or maybe something entirely different). Thanks in advance!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p818430.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autosuggest

2010-05-14 Thread Blargy


Easiest and oldest is wildcards on facets. 
- Does this allow partial matching or is this only prefix matching?

It and facets allow limiting the database with searches. Using the spelling
database does not allow this.
- What do you mean?

So there is no generally accepted preferred way to do auto-suggest? 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p818705.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autosuggest

2010-05-14 Thread Blargy


Thanks for your help and especially your analyzer.. probably saved me a
full-import or two  :)

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p818712.html
Sent from the Solr - User mailing list archive at Nabble.com.

Question on pf (Phrase Fields)

2010-05-13 Thread Blargy


Is there any way to configure this so it only takes after if you match more
than one word?

For example if I search for: foo it should have no effect on scoring, but
if I search for foo bar then it should.

Is this possible? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-pf-Phrase-Fields-tp815095p815095.html
Sent from the Solr - User mailing list archive at Nabble.com.

Advancded Reading

2010-05-13 Thread Blargy


Does anyone know of any documentation that is more in-depth that the wiki and
the Solr 1.4 book? I'm passed the basic usage of Solr and creating simple
support plugins. I really want to know all about the inner workings of Solr
and Lucene. Can someone recommend anything?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Advancded-Reading-tp815382p815382.html
Sent from the Solr - User mailing list archive at Nabble.com.

DIH settings

2010-05-13 Thread Blargy


Can you please share with me your DIH settings and JDBC driver you are using.

I'll start...

jdbc driver = mysql-connector-java-5.1.12-bin
batchSize = -1
readOnly = true


Would someone mind explaining what convertType and transactionIsolation
actually does? The wiki doesnt really explain the purpose of it. Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-settings-tp816166p816166.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-12 Thread Blargy


Mike,

This only happens when I attempt to do a delta-import without first deleting
the index dir before doing a full-index.

 For example these will work correctly.

1) Delete /home/corename/data
2) Full-Import
3) Delta-Import

However I attempt to do the following, it will result in an error

1) Delete /home/corename/data
2) Full-Import
3) Delta-Import
4) (After many successful delta-imports) Full-Import
5) Delta-Import (Error now occurs)

So it seems that this only happens after doing a full-import for a second
time.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p812559.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: MLT Boost Function

2010-05-12 Thread Blargy


Anyone know of any way to accomplish (or at least simulate) this?

Thanks again
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-Boost-Function-tp811227p813982.html
Sent from the Solr - User mailing list archive at Nabble.com.

StackOverflowError during Delta-Import

2010-05-11 Thread Blargy


Posted a few weeks ago about this but no one seemed to respond. Has anyone
seen this before? Why is this happening and more importantly how can I fix
it? Thanks in advance!

May 11, 2010 12:05:45 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.StackOverflowError
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3296)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1941)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2690)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)

Re: StackOverflowError during Delta-Import

2010-05-11 Thread Blargy


FYI I am using the mysql-connector-java-5.1.12-bin.jar as my JDBC driver
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p811058.html
Sent from the Solr - User mailing list archive at Nabble.com.

MLT Boost Function

2010-05-11 Thread Blargy


How can one accomplish a MoreLikeThis search using boost functions?

If its not capable out of the box, can someone point me in the right
direction on what I would need to create to get this working? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-Boost-Function-tp811227p811227.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom DIH variables

2010-05-08 Thread Blargy


Thanks for the input Lance. 

My use case was actually pretty simple so my solution was relatively simple.
I ended up using the HTTP method. The code is listed here:
http://pastie.org/952040. I would appreciate any comments.

iorixxx you may find this solution to be of some use to you.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p786505.html
Sent from the Solr - User mailing list archive at Nabble.com.

CommonsHttpSolrServer vs EmbeddedSolrServer

2010-05-07 Thread Blargy


Can someone please explain to me the use cases when one would use one over
the other.

All I got from the wiki was: (In reference to Embedded) If you need to use
solr in an embedded application, this is the recommended approach. It allows
you to work with the same interface whether or not you have access to HTTP.


I had a use case (detailed here:
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-td777696.html#a777696)
where I tried creating a new server via the current core but I kept getting
a SEVERE: java.util.concurrent.RejectedExecutionException... SEVERE: Too
many close [count:-3] on org.apache.solr.core.SolrCore Maybe my
implementation was off??? 

Is there any detailed documentation on SolrJ usage.. more than the wiki? Any
books? Thanks




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp784201p784201.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom DIH variables

2010-05-07 Thread Blargy


Thanks for the tip Lance. Just for reference, why is it dangerous to use the
HTTP method? I realized that the embedded method is probably not the way to
go (obviously since I was getting that SEVERE:
java.util.concurrent.RejectedExecutionException) 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p785161.html
Sent from the Solr - User mailing list archive at Nabble.com.

SEVERE: java.util.concurrent.RejectedExecutionException

2010-05-06 Thread Blargy


I am working with creating my own custom dataimport handler evaluator class
and I keep running across this error when I am trying to delta-import. It
told me to post this exception to the mailing list so thats what I am doing
;)

[java] SEVERE: java.util.concurrent.RejectedExecutionException
 [java] at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760)
 [java] at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 [java] at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 [java] at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
 [java] at
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
 [java] at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1141)
 [java] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:913)
 [java] at
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:209)
 [java] at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:139)
 [java] at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 [java] at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 [java] at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 [java] at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
 [java] at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
 [java] at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 [java] at
com.ioffer.solr.handler.dataimport.LatestTimestampEvaluator.evaluate(Unknown
Source)
 [java] at
org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216)
 [java] at
org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:202)
 [java] at
org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:103)
 [java] at
org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
 [java] at
org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
 [java] at
org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87)
 [java] at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextModifiedRowKey(SqlEntityProcessor.java:81)
 [java] at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextModifiedRowKey(EntityProcessorWrapper.java:251)
 [java] at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:621)
 [java] at
org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:608)
 [java] at
org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:258)
 [java] at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:172)
 [java] at
org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:352)
 [java] at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:391)
 [java] at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 [java] 
 [java] May 6, 2010 8:04:53 PM org.apache.solr.core.SolrCore close
 [java] SEVERE: Too many close [count:-1] on
org.apache.solr.core.solrc...@15db4492. Please report this exception to
solr-user@lucene.apache.org
 [java] May 6, 2010 8:04:53 PM org.apache.solr.core.SolrCore close
 [java] SEVERE: Too many close [count:-2] on
org.apache.solr.core.solrc...@15db4492. Please report this exception to
solr-user@lucene.apache.org
 [java] May 6, 2010 8:04:53 PM org.apache.solr.core.SolrCore close
 [java] SEVERE: Too many close [count:-3] on
org.apache.solr.core.solrc...@15db4492. Please report this exception to
solr-user@lucene.apache.org
 [java] May 6, 2010 8:04:53 PM org.apache.solr.common.SolrException log
 [java] SEVERE: java.util.concurrent.RejectedExecutionException
 [java] at
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1760)
 [java] at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 [java] at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
 [java] at
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
 [java] at
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
 [java] at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1141)
 [java] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:913)
 [java] at

Re: Custom DIH variables

2010-05-06 Thread Blargy


So I came up with the following class.

public class LatestTimestampEvaluator extends Evaluator {

  private static final Logger logger =
Logger.getLogger(LatestTimestampEvaluator.class.getName());

  @Override
  public String evaluate(String expression, Context context) {

List params = EvaluatorBag.parseParams(expression,
context.getVariableResolver());
String field = params.get(0).toString();

SolrCore core = context.getSolrCore();
CoreContainer container = new CoreContainer();
container.register(core, false);
EmbeddedSolrServer server = new EmbeddedSolrServer(container,
core.getName());

SolrQuery query = new SolrQuery(*:*);
query.addSortField(field, SolrQuery.ORDER.desc);
query.setRows(1);

try {
  QueryResponse response = null;
  response = server.query(query);

  SolrDocument document = response.getResults().get(0);
  Date date = (Date) document.getFirstValue(field);
  String timestamp = new Timestamp(date.getTime()).toString();
  logger.info(timestamp);
  
  return timestamp;
} catch (Exception exception) {
  logger.severe(exception.getMessage());
  logger.severe(DocumentUtils.stackTraceToString(exception));

  return null;
} finally {
  core.close();
  container.shutdown();
}
  }

and I am calling it within my dataconfig file like so...

dataConfig
  function name=latest_timestamp
class=com.mycompany.solr.handler.dataimport.LatestTimestampEvaluator/
...

  entity name=item ... 
  deltaQuery=select id from items where  updated_on 
'${dataimporter.functions.latest_timestamp('updated_on')}'
   
   /entity
/datConfig

I was hoping someone can 

1) Comment on the above class. How does it suck? This was my first time
working with SolrJ.
2) It seems to work find when I there is only one entity using that function
but when there are more entities using that function (which is my use case)
I get a SEVERE: java.util.concurrent.RejectedExecutionException exception.
Can someone explain why this is happening and how I can fix it. I added the
full stack trace to a separate thread here:
http://lucene.472066.n3.nabble.com/SEVERE-java-util-concurrent-RejectedExecutionException-tp782768p782768.html

Thanks for your help!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p782769.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SEVERE: java.util.concurrent.RejectedExecutionException

2010-05-06 Thread Blargy


FYI, the code that is causing this exception and an explanation of my
specific use case is all listed in this thread:
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-td777696.html
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SEVERE-java-util-concurrent-RejectedExecutionException-tp782768p782772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom DIH variables

2010-05-05 Thread Blargy


Thanks Paul, that will certainly work. I was just hoping there was a way I
could write my own class that would inject this value as needed instead of
precomputing this value and then passing it along in the params.

My specific use case is instead of using dataimporter.last_index_time I want
to use something like dataimporter.updated_time_of_last_document. Our DIH is
set up to use a bunch of slave databases and there have been problems with
some documents getting lost due to replication lag. 

I would prefer to compute this value using a custom variable at runtime
instead of passing it along via the params. Is that even possible? If not
Ill have to go with your previous suggestion. 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p779278.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom DIH variables

2010-05-05 Thread Blargy


Thanks Noble this is exactly what I was looking for.

What is the preferred way to query solr within these sorts of classes?
Should I grab the core from the context that is being passed in? Should I be
using SolrJ?

Can you provide an example and/or provide some tutorials/documentation. 

Once again, thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p780332.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom DIH EventListeners

2010-05-05 Thread Blargy


I know one can create custom event listeners for update or query events, but
is it possible to create one for any DIH event (Full-Import, Delta-Import)?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-EventListeners-tp780517p780517.html
Sent from the Solr - User mailing list archive at Nabble.com.

Custom DIH variables

2010-05-04 Thread Blargy


Can someone please point me in the right direction (classes) on how to create
my own custom dih variable that can be used in my data-config.xml

So instead of ${dataimporter.last_index_time} I want to be able to create
${dataimporter.foo}

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-DIH-variables-tp777696p777696.html
Sent from the Solr - User mailing list archive at Nabble.com.

Random Field

2010-05-01 Thread Blargy


Can someone explain a useful case for the RandomSortField? 

!-- The RandomSortField is not used to store or search any
 data.  You can declare fields of this type it in your schema
 to generate pseudo-random orderings of your docs for sorting 
 purposes.  The ordering is generated based on the field name 
 and the version of the index, As long as the index version
 remains unchanged, and the same field name is reused,
 the ordering of the docs will be consistent.  
 If you want different psuedo-random orderings of documents,
 for the same version of the index, use a dynamicField and
 change the name
 --
fieldType name=random class=solr.RandomSortField indexed=true /
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Random-Field-tp770087p770087.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boost function on :

2010-04-26 Thread Blargy


Correct, I am using dismax by default.

I actually accomplished what I was looking for by creating a separate
request handler with a defType of lucene and then I used _val_ hook. 

I tried using the {!func}function as you describe but couldn't get it work.
Are there any difference between the two?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-function-on-tp747131p757817.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boost function on :

2010-04-23 Thread Blargy


Is it possible to use boost function across the whole index/empty search
term? 

I'm guessing the next question that would be asked is Why would you want to
do that. Well with have a bunch of custom business metrics included in each
document (a product). I would like to only show the best products (based on
our metrics and some boost functions) in absence of a search term. 

Is this possible?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-function-on-tp747131p747131.html
Sent from the Solr - User mailing list archive at Nabble.com.

Highlighting apostrophe

2010-04-19 Thread Blargy


I have the following text field:

fieldType name=text class=solr.TextField omitNorms=false
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumber=1
catenateAll=1
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter
class=com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
...
   field name=title stored=true termVectors=true type=text
multiValued=true indexed=true/


When I search for women's, womens or women I correctly get back all the
results I want. However when I use the highlighting feature it only
highlights women in the women's cases. How can I highlight the whole word
women's including the apostrophe?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Highlighting-apostrophe-tp731155p731155.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting apostrophe

2010-04-19 Thread Blargy


Same general question about highlighting the full work sunglasses when I
search for glasses. Is this possible?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Highlighting-apostrophe-tp731155p731305.html
Sent from the Solr - User mailing list archive at Nabble.com.

1 2 >

1 - 100 of 172 matches

Mail list logo