Solr Cloud Using Docker

2016-09-16 Thread Brendan Grainger
Hi,

Does anyone used docker for deploying solr? I am using it for running a single 
solr server ‘cloud’ locally on my dev box, but wondering about the pros/cons of 
using it in production.

Thanks,
Brendan

state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi,

I am creating a new collection using the following get request:

http://ec2_host:8983/solr/admin/collections?action=CREATE=collection_name_1=oem/conf=1

What I’m finding is that now and then base_url for the replica in state.json is 
set to the internal IP of the AWS node. i.e.:

"base_url":"http://10.29.XXX.XX:8983/solr”,

On other attempts it’s set to the public DNS name of the node:

"base_url":"http://ec2_host:8983/solr”,

In my /etc/defaults/solr.in.sh I have:

SOLR_HOST=“ec2_host”

which I thought is what I needed to get the public DNS name set in base_url. 

Am I doing this incorrectly or is there something I’m missing here? The issue 
this causes is zookeeper gives back an internal IP to my indexing processes 
when the internal IP is set on base_url and they then can’t find the server. 

Thanks!






Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi Hoss,

Thanks for the reply. I installed the service using the install script. I 
double checked it and it looks like it install solr.in.sh in 
/etc/defaults/solr.in.sh. It actually looks like if it is in /var the install 
script moves it into /etc/defaults (unless I’m reading this wrong):

https://github.com/apache/lucene-solr/blob/trunk/solr/bin/install_solr_service.sh#L281
 


I checked the process and even on restarts it looks like this:

ps aux | grep solr
  my_solr_user  9522  0.2  1.5 3010216 272656 ?  Sl   20:06   0:26 
/usr/lib/jvm/java-8-oracle/bin/java -server -Xms512m -Xmx512m -XX:NewRatio=3 
-XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 
-XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark 
-XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc 
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log 
-DzkClientTimeout=15000 -DzkHost=ec2_host:2181/solr -Djetty.port=8983 
-DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Dhost=ec2_host -Duser.timezone=UTC 
-Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data 
-Dsolr.install.dir=/opt/solr 
-Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -jar start.jar 
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs 
--module=http

Note I replaced the user I’m running it as with my_solr_user and the actual ec2 
public DNS with e2_host in the output above.

I am new to SolrCloud so it’s more than likely I’ve screwed up some 
configuration setting somewhere.

Thank you for your help,
Brendan

> On Jan 15, 2016, at 6:07 PM, Chris Hostetter  wrote:
> 
> 
> : What I’m finding is that now and then base_url for the replica in 
> : state.json is set to the internal IP of the AWS node. i.e.:
> : 
> : "base_url":"http://10.29.XXX.XX:8983/solr”,
> : 
> : On other attempts it’s set to the public DNS name of the node:
> : 
> : "base_url":"http://ec2_host:8983/solr”,
> : 
> : In my /etc/defaults/solr.in.sh I have:
> : 
> : SOLR_HOST=“ec2_host”
> : 
> : which I thought is what I needed to get the public DNS name set in 
> base_url. 
> 
> i believe you are correct.  the "now and then" part of your question is 
> weird -- it seems to indicate that sometimes the "correct" thing is 
> happening, and other times it is not.  
> 
> /etc/defaults/solr.in.sh isn't the canonical path for solr.in.sh 
> according to the docs/install script for running a production solr 
> instance...
> 
> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-ServiceInstallationScript
> 
> ...how *exactly* are you running Solr on all of your nodes?
> 
> because my guess is that you've got some kind of inconsistent setup where 
> sometimes when you startup (or restart) a new node it does refer to your 
> solr.in.sh file, and other times it does not -- so sometimes solr never 
> sees your SOLR_HOST option.  In those cases, when it regesters itself with 
> ZooKeeper it uses the current IP as a fallback, and then that info gets 
> backed into the metadata for the replicas that get created on that node 
> at that point in time.
> 
> FWIW, you should be able to spot check that the SOLR_HOST is being applied 
> correctly by looking at the java process command line args (using PS, or 
> loading the SOlr UI in your browser) and checking for the "-Dhost=..." 
> option -- if it's not there, then your solr.in.sh probably wasn't read in 
> correctly
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: state.json base_url has internal IP of ec2 instance set instead of 'public DNS' entry in some cases

2016-01-15 Thread Brendan Grainger
Hi Hoss,

Thanks for your help. Going over the install page again I realized I had 
originally not adjusted the value of SOLR_HOST and it had started up using the 
default internal IP. I changed that to the public DNS and restarted solr. 
However in /live_nodes I then had 2 values: one for the public DNS and one for 
the internal IP. It looks like it didn’t get removed. I removed it using the 
zookeeper cli and all is working fine now.

I’m unsure, but wondering if the behavior I saw is somehow related to this: 
http://www.gossamer-threads.com/lists/lucene/java-dev/297790 
 However, as I 
said I’m pretty new to this so I could be completely wrong.

Thanks again
Brendan


> On Jan 15, 2016, at 6:07 PM, Chris Hostetter  wrote:
> 
> 
> : What I’m finding is that now and then base_url for the replica in 
> : state.json is set to the internal IP of the AWS node. i.e.:
> : 
> : "base_url":"http://10.29.XXX.XX:8983/solr”,
> : 
> : On other attempts it’s set to the public DNS name of the node:
> : 
> : "base_url":"http://ec2_host:8983/solr”,
> : 
> : In my /etc/defaults/solr.in.sh I have:
> : 
> : SOLR_HOST=“ec2_host”
> : 
> : which I thought is what I needed to get the public DNS name set in 
> base_url. 
> 
> i believe you are correct.  the "now and then" part of your question is 
> weird -- it seems to indicate that sometimes the "correct" thing is 
> happening, and other times it is not.  
> 
> /etc/defaults/solr.in.sh isn't the canonical path for solr.in.sh 
> according to the docs/install script for running a production solr 
> instance...
> 
> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-ServiceInstallationScript
> 
> ...how *exactly* are you running Solr on all of your nodes?
> 
> because my guess is that you've got some kind of inconsistent setup where 
> sometimes when you startup (or restart) a new node it does refer to your 
> solr.in.sh file, and other times it does not -- so sometimes solr never 
> sees your SOLR_HOST option.  In those cases, when it regesters itself with 
> ZooKeeper it uses the current IP as a fallback, and then that info gets 
> backed into the metadata for the replicas that get created on that node 
> at that point in time.
> 
> FWIW, you should be able to spot check that the SOLR_HOST is being applied 
> correctly by looking at the java process command line args (using PS, or 
> loading the SOlr UI in your browser) and checking for the "-Dhost=..." 
> option -- if it's not there, then your solr.in.sh probably wasn't read in 
> correctly
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Solr replicating at 5 mb/sec

2014-02-22 Thread Brendan Grainger
First thing I'd check is just transferring a large file (created with dd or
something) over the network to make sure it's solr that is the issue.



On Sat, Feb 22, 2014 at 8:45 AM, Cool Techi cooltec...@outlook.com wrote:

 Hi,
 I am running solr replication between two machine which are connected by a
 1 gb network speed. The best speed I am getting for replication is 5mb/sec,
 how can this be increased.
 The replication keeps failing and this is the first time replication of an
 index over 300Gb in size. We are using solr verson 4.1 on master and solr
 4.3 on slave.
 Regards,Ayush




-- 
Brendan Grainger
www.kuripai.com


Re: solr built with maven

2013-08-23 Thread Brendan Grainger
You want to change the solr source code itself or you want to create your
own Tokenizers and things? If the later why not just set up solr as a
dependency in your pom.xml like so:

 dependency

 groupIdorg.apache.lucene/groupId

 artifactIdlucene-test-framework/artifactId

 scopetest/scope

 version${solr.version}/version

 /dependency


 dependency

 groupIdorg.apache.solr/groupId

 artifactIdsolr-test-framework/artifactId

 scopetest/scope

 version${solr.version}/version

 /dependency


dependency

 groupIdorg.apache.lucene/groupId

 artifactIdlucene-core/artifactId

 version${solr.version}/version

 /dependency


 dependency

 groupIdorg.apache.lucene/groupId

 artifactIdlucene-facet/artifactId

 version${solr.version}/version

 /dependency


 dependency

 groupIdorg.apache.solr/groupId

 artifactIdsolr/artifactId

 version${solr.version}/version

 typewar/type

 /dependency


 dependency

 groupIdorg.apache.solr/groupId

 artifactIdsolr-core/artifactId

 version${solr.version}/version

 /dependency

 dependency

 groupIdorg.apache.solr/groupId

 artifactIdsolr-solrj/artifactId

 version${solr.version}/version

 /dependency


 dependency

 groupIdorg.apache.solr/groupId

 artifactIdsolr-langid/artifactId

 version${solr.version}/version

 /dependency

 dependency

 groupIdlog4j/groupId

 artifactIdlog4j/artifactId

 version1.2.16/version

 /dependency

 dependency

 groupIdcommons-cli/groupId

 artifactIdcommons-cli/artifactId

 version1.2/version

 /dependency


 dependency

 groupIdjavax.servlet/groupId

 artifactIdservlet-api/artifactId

 version2.5/version

 /dependency


On Fri, Aug 23, 2013 at 12:24 PM, Bruno René Santos brunor...@gmail.comwrote:

 Hello all,

 I am building Solr's source code through maven in order to develop on top
 of it on Netbeans (As no ant task was made to Netbeans... not cool!).

 Three doubts about that:

 1. How can I execute the solr server?
 2. How can i debug the solr server?
 3. If I create new packages (RequestHandlers, TOkenizers, etc) where can I
 put them so that the compilation process will view the new files?

 Regards
 Bruno Santos

 --
 Bruno René Santos
 Lisboa - Portugal




-- 
Brendan Grainger
www.kuripai.com


Re: Solr Ref guide question

2013-08-22 Thread Brendan Grainger
What version of solr are you using? Have you copied a solr.xml from
somewhere else? I can almost reproduce the error you're getting if I put a
non-existent core in my solr.xml, e.g.:

solr

  cores adminPath=/admin/cores
core name=core0 instanceDir=a_non_existent_core /
  /cores
...


On Thu, Aug 22, 2013 at 1:30 PM, yriveiro yago.rive...@gmail.com wrote:

 Hi all,

 I think that there is some lack in solr's ref doc.

 Section Running Solr says to run solr using the command:

 $ java -jar start.jar

 But If I do this with a fresh install, I have a stack trace like this:
 http://pastebin.com/5YRRccTx

 Is it this behavior as expected?



 -
 Best regards
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Ref-guide-question-tp4086142.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Brendan Grainger
www.kuripai.com


Re: Where is the webapps directory of servlet container?

2013-08-16 Thread Brendan Grainger
Hi,

Slightly off topic, but just wondering if you've worked through the
tutorial: https://lucene.apache.org/solr/4_4_0/tutorial.html You can then
use the packaged Jetty servlet container while you get comfortable with
working with solr.

Best of luck
Brendan



On Fri, Aug 16, 2013 at 12:25 PM, Kamaljeet Kaur kamal.kaur...@gmail.comwrote:

 On Fri, Aug 16, 2013 at 1:20 PM, Artem Karpenko [via Lucene]
 ml-node+s472066n4084995...@n3.nabble.com wrote:
  it's also mentioned on that page that Solr runs inside a Java servlet
  container such as Tomcat, Jetty, or Resin - you have to install one of
  those first.


 Ok.
 Can you please suggest me the servlet container to use with Django 1.4
 and solr version 4.4.0?
 And its version?

 --
 Kamaljeet Kaur

 kamalkaur188.wordpress.com
 facebook.com/kaur.188




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Where-is-the-webapps-directory-of-servlet-container-tp4084968p4085094.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Brendan Grainger
www.kuripai.com


Re: Where is the webapps directory of servlet container?

2013-08-16 Thread Brendan Grainger
Assuming you have downloaded solr into a dir called 'solr' then if you look
in the 'example' there is a bundled Jetty installation ready roll for
testing etc. So to answer your question 'why jetty?'. Have you worked
through the tutorial?



On Fri, Aug 16, 2013 at 2:06 PM, Kamaljeet Kaur kamal.kaur...@gmail.comwrote:

 On Fri, Aug 16, 2013 at 10:22 PM, Brendan Grainger [via Lucene]
 ml-node+s472066n4085100...@n3.nabble.com wrote:
  ou can then
  use the packaged Jetty servlet container while you get comfortable with
  working with solr.


 Can I ask why jetty?

 --
 Kamaljeet Kaur

 kamalkaur188.wordpress.com
 facebook.com/kaur.188




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Where-is-the-webapps-directory-of-servlet-container-tp4084968p4085135.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Brendan Grainger
www.kuripai.com


Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I've been debugging an issue where the query 'tpms' would make the
spellchecker throw the following exception:

21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
null:java.lang.StringIndexOutOfBoundsException: String index out of range:
-1
at java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
at java.lang.StringBuilder.replace(StringBuilder.java:266)
at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)


I have the following synonyms defined for tpms:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

Note that if you query any of the other synonyms there is no issue, only
tpms.

Looking at my field definition for my spellchecker I realized I am doing
query time synonym expansion:

fieldType name=text_spell class=solr.TextField
positionIncrementGap=100 omitNorms=true
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
  /analyzer
/fieldType

I copied this field definition from:
http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
related to synonyms I removed the SynonymFilterFactory and everything
works.

I'm going to try to create a reproducible test case for the crash, but
right now I'm wondering what I lose by not having synonym expansion when
spell checking?

Thanks
Brendan


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Further to this. If I change:

tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system

to

service tire monitor,tire monitor,tire pressure monitor,tire pressure
monitoring system,tpm,low tire warning,tire pressure monitor system,tpms

I don't get a crash. I tried it with some other fields too. e.g.:

asdm,airbag system diagnostic module = crash

airbag system diagnostic module,asdm = no crash

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Hi All,

 I've been debugging an issue where the query 'tpms' would make the
 spellchecker throw the following exception:

 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  –
 null:java.lang.StringIndexOutOfBoundsException: String index out of range:
 -1
  at
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
 at java.lang.StringBuilder.replace(StringBuilder.java:266)
  at
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
 at
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)


 I have the following synonyms defined for tpms:

 tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
 monitoring system,tpm,low tire warning,tire pressure monitor system

 Note that if you query any of the other synonyms there is no issue, only
 tpms.

 Looking at my field definition for my spellchecker I realized I am doing
 query time synonym expansion:

 fieldType name=text_spell class=solr.TextField
 positionIncrementGap=100 omitNorms=true
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
 /fieldType

 I copied this field definition from:
 http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
 related to synonyms I removed the SynonymFilterFactory and everything
 works.

 I'm going to try to create a reproducible test case for the crash, but
 right now I'm wondering what I lose by not having synonym expansion when
 spell checking?

 Thanks
 Brendan





-- 
Brendan Grainger
www.kuripai.com


Re: Synonym Expansion in Spellchecking Field Solr 4.3.1

2013-08-15 Thread Brendan Grainger
Hi All,

I didn't have the lucene-solr source compiling cleaning in eclipse
initially so I created a very quick maven project to demonstrate this issue:

https://github.com/rainkinz/solr_spellcheck_index_out_of_bounds.git

Having said that I just got everything set up in eclipse, so I can create a
test case if this is actually an issue and not something weird with my
configuration.

Thanks
Brendan



On Thu, Aug 15, 2013 at 1:43 PM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Further to this. If I change:

 tpms,service tire monitor,tire monitor,tire pressure monitor,tire pressure
 monitoring system,tpm,low tire warning,tire pressure monitor system

 to

 service tire monitor,tire monitor,tire pressure monitor,tire pressure
 monitoring system,tpm,low tire warning,tire pressure monitor system,tpms

 I don't get a crash. I tried it with some other fields too. e.g.:

 asdm,airbag system diagnostic module = crash

 airbag system diagnostic module,asdm = no crash

 Thanks
 Brendan



 On Thu, Aug 15, 2013 at 1:37 PM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

 Hi All,

 I've been debugging an issue where the query 'tpms' would make the
 spellchecker throw the following exception:

 21021 [qtp91486057-17] ERROR org.apache.solr.servlet.SolrDispatchFilter
  – null:java.lang.StringIndexOutOfBoundsException: String index out of
 range: -1
  at
 java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789)
 at java.lang.StringBuilder.replace(StringBuilder.java:266)
  at
 org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:190)
 at
 org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:75)


 I have the following synonyms defined for tpms:

 tpms,service tire monitor,tire monitor,tire pressure monitor,tire
 pressure monitoring system,tpm,low tire warning,tire pressure monitor system

 Note that if you query any of the other synonyms there is no issue, only
 tpms.

 Looking at my field definition for my spellchecker I realized I am doing
 query time synonym expansion:

 fieldType name=text_spell class=solr.TextField
 positionIncrementGap=100 omitNorms=true
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
 /fieldType

 I copied this field definition from:
 http://wiki.apache.org/solr/SpellCheckingAnalysis. As the issue seemed
 related to synonyms I removed the SynonymFilterFactory and everything
 works.

 I'm going to try to create a reproducible test case for the crash, but
 right now I'm wondering what I lose by not having synonym expansion when
 spell checking?

 Thanks
  Brendan





 --
 Brendan Grainger
 www.kuripai.com




-- 
Brendan Grainger
www.kuripai.com


Re: 'Optimizing' Solr Index Size

2013-08-07 Thread Brendan Grainger
Thanks Erick,  our index is relatively static. I think the deletes must be
coming from 'reindexing' the same documents so definitely handy to recover
the space. I've seen that video before. Definitely very interesting.

Brendan


On Wed, Aug 7, 2013 at 8:04 AM, Erick Erickson erickerick...@gmail.comwrote:

 The general advice is to not merge (optimize) unless your
 index is relatively static. You're quite correct, optimizing
 simply recovers the space from deleted documents, otherwise
 it won't change much (except having fewer segments).

 Here's a _great_ video that Mike McCandless put together:

 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 But in general _whenever_ segments are merged, the
 resulting segment will have all the data from deleted docs
 removed, and segments are merged continually when
 data is being added to the index.

 Quick-n-dirty way to estimate the space savings
 optimize will give you. Look at the admin page for the core and
 the ratio of deleted docs to numDocs is about the unused
 space that would be regained by an optimize. From there it's
 your call G...

 Best
 Erick


 On Tue, Aug 6, 2013 at 12:02 PM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

  To maybe answer another one of my questions about the 50Gb recovered when
  running:
 
  curl '
 
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
  '
 
  It looks to me that it was from deleted docs being completely removed
 from
  the index.
 
  Thanks
 
 
 
  On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
  brendan.grain...@gmail.com wrote:
 
   Well, I guess I can answer one of my questions which I didn't exactly
   explicitly state, which is: how do I force solr to merge segments to a
   given maximum. I forgot about doing this:
  
   curl '
  
 
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
   '
  
   which reduced the number of segments in my index from 12 to 10.
  Amazingly,
   it also reduced the space used by almost 50Gb. Is that even possible?
  
   Thanks again
   Brendan
  
  
  
   On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
   brendan.grain...@gmail.com wrote:
  
   Hi All,
  
   First of all, what I was actually trying to do is actually get a
 little
   space back. So if there is a better way to do this by adjusting the
   MergePolicy or something else please let me know. My index is
 currently
   200Gb. In the past (Solr 1.4) we've found that optimizing the index
 will
   double the size of the index temporarily then usually when it's done
 we
  end
   up with a smaller index and slightly faster search query times.
  
   Should I even bother optimizing? My impression was that with the
   TieredMergePolicy this would be less necessary. Would merging segments
  into
   larger ones save any space and if so is there a way to tell solr to do
  that?
  
   Thanks
   Brendan
  
  
  
  
   --
   Brendan Grainger
   www.kuripai.com
  
 
 
 
  --
  Brendan Grainger
  www.kuripai.com
 




-- 
Brendan Grainger
www.kuripai.com


'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Hi All,

First of all, what I was actually trying to do is actually get a little
space back. So if there is a better way to do this by adjusting the
MergePolicy or something else please let me know. My index is currently
200Gb. In the past (Solr 1.4) we've found that optimizing the index will
double the size of the index temporarily then usually when it's done we end
up with a smaller index and slightly faster search query times.

Should I even bother optimizing? My impression was that with the
TieredMergePolicy this would be less necessary. Would merging segments into
larger ones save any space and if so is there a way to tell solr to do that?

Thanks
Brendan


Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
Well, I guess I can answer one of my questions which I didn't exactly
explicitly state, which is: how do I force solr to merge segments to a
given maximum. I forgot about doing this:

curl '
http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
'

which reduced the number of segments in my index from 12 to 10. Amazingly,
it also reduced the space used by almost 50Gb. Is that even possible?

Thanks again
Brendan



On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Hi All,

 First of all, what I was actually trying to do is actually get a little
 space back. So if there is a better way to do this by adjusting the
 MergePolicy or something else please let me know. My index is currently
 200Gb. In the past (Solr 1.4) we've found that optimizing the index will
 double the size of the index temporarily then usually when it's done we end
 up with a smaller index and slightly faster search query times.

 Should I even bother optimizing? My impression was that with the
 TieredMergePolicy this would be less necessary. Would merging segments into
 larger ones save any space and if so is there a way to tell solr to do that?

 Thanks
 Brendan




-- 
Brendan Grainger
www.kuripai.com


Re: 'Optimizing' Solr Index Size

2013-08-06 Thread Brendan Grainger
To maybe answer another one of my questions about the 50Gb recovered when
running:

curl '
http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
'

It looks to me that it was from deleted docs being completely removed from
the index.

Thanks



On Tue, Aug 6, 2013 at 11:45 AM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Well, I guess I can answer one of my questions which I didn't exactly
 explicitly state, which is: how do I force solr to merge segments to a
 given maximum. I forgot about doing this:

 curl '
 http://localhost:8983/solr/update?optimize=truemaxSegments=10waitFlush=false
 '

 which reduced the number of segments in my index from 12 to 10. Amazingly,
 it also reduced the space used by almost 50Gb. Is that even possible?

 Thanks again
 Brendan



 On Tue, Aug 6, 2013 at 10:55 AM, Brendan Grainger 
 brendan.grain...@gmail.com wrote:

 Hi All,

 First of all, what I was actually trying to do is actually get a little
 space back. So if there is a better way to do this by adjusting the
 MergePolicy or something else please let me know. My index is currently
 200Gb. In the past (Solr 1.4) we've found that optimizing the index will
 double the size of the index temporarily then usually when it's done we end
 up with a smaller index and slightly faster search query times.

 Should I even bother optimizing? My impression was that with the
 TieredMergePolicy this would be less necessary. Would merging segments into
 larger ones save any space and if so is there a way to tell solr to do that?

 Thanks
 Brendan




 --
 Brendan Grainger
 www.kuripai.com




-- 
Brendan Grainger
www.kuripai.com


Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi All,

I have an IndexBasedSpellChecker component configured as follows (note the
field parameter is set to the spellcheck field):

  searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetext_spell/str

lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.IndexBasedSpellChecker/str
  !--
  Load tokens from the following field for spell checking,
  analyzer for the field's type as defined in schema.xml are used
  --
*  str name=fieldspellcheck/str*
  str name=spellcheckIndexDir./spellchecker/str
  float name=thresholdTokenFrequency.0001/float
/lst
  /searchComponent

with the corresponding field type for spellcheck:

fieldType name=text_spell class=solr.TextField
positionIncrementGap=100 omitNorms=true
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=moto_synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=lang/stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
  /analyzer
/fieldType

and field:

!-- spellcheck field is multivalued because it has the title and markup
  fields copied into it --
field name=spellcheck type=text_spell stored=false
omitTermFreqAndPositions=true multiValued=true/

values from a markup and title field are copied into the spellcheck field.

My /select search component has the following defaults:

lst name=defaults
  str name=echoParamsexplicit/str
  int name=rows10/int
  str name=dfmarkup_texts title_texts/str

  !-- Spell checking defaults --
  str name=spellchecktrue/str
  str name=spellcheck.collateExtendedResultstrue/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.maxCollations2/str
  str name=spellcheck.maxCollationTries5/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str

  str name=spellcheck.maxResultsForSuggest5/str
  str name=spellcheck.alternativeTermCount5/str

 /lst


When I issue a search like this:

http://localhost:8981/solr/articles/select?indent=truespellcheck.q=markup_texts:(Perfrm%20HVC)q=Perfrm%20HVCrows=0

I get collations:

lst name=collation
str name=collationQuerymarkup_texts:(perform hvac)/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperform/str
str name=hvchvac/str
/lst
/lst
lst name=collation
str name=collationQuerymarkup_texts:(performed hvac)/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperformed/str
str name=hvchvac/str
/lst
/lst

However, if I remove the spellcheck.q parameter I do not, i.e. no
collations are returned for the following:

http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0



If I specify the fields being searched over for the q parameter I get
collations:

http://localhost:8981/solr/articles/select?indent=trueq=markup_texts:(Perfrm%20HVC)rows=0

lst name=collation
str name=collationQuerymarkup_texts:(perform hvac)/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperform/str
str name=hvchvac/str
/lst
/lst
lst name=collation
str name=collationQuerymarkup_texts:(performed hvac)/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperformed/str
str name=hvchvac/str
/lst
/lst


I'm a bit confused as to what the value for field should be in spellcheck
component definition. In fact what is it's purpose here, just as the input
for building the spellchecking index? If that is so then why do I need to
even specify the queryAnalyzerFieldType?

Also, why do I need to explicitly specify the field in the query or
spellcheck.q to get collations?

Thanks and sorry for the rather long question.

Brendan


Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi James,

I get the following response for that query:

response
lst name=responseHeader
int name=status0/int
int name=QTime8/int
lst name=params
str name=indenttrue/str
str name=qPerfrm HVC/str
str name=rows0/str
/lst
/lst
result name=response numFound=0 start=0/result
lst name=spellcheck
lst name=suggestions
lst name=perfrm
int name=numFound3/int
int name=startOffset0/int
int name=endOffset6/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordperform/str
int name=freq4/int
/lst
lst
str name=wordperformed/str
int name=freq1/int
/lst
lst
str name=wordperformance/str
int name=freq3/int
/lst
/arr
/lst
lst name=hvc
int name=numFound2/int
int name=startOffset7/int
int name=endOffset10/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordhvac/str
int name=freq4/int
/lst
lst
str name=wordhave/str
int name=freq5/int
/lst
/arr
/lst
bool name=correctlySpelledfalse/bool
/lst
/lst
/response

Thanks
Brendan


On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 For this query:


 http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0

 ...do you get anything back in the spellcheck response?  Is it correcting
 the individual words and not giving collations?  Or are you getting no
 individual word suggestions also?

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
 Sent: Tuesday, July 23, 2013 1:47 PM
 To: solr-user@lucene.apache.org
 Subject: Spellcheck field element and collation issues

 Hi All,

 I have an IndexBasedSpellChecker component configured as follows (note the
 field parameter is set to the spellcheck field):

   searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetext_spell/str

 lst name=spellchecker
   str name=namedefault/str
   str name=classnamesolr.IndexBasedSpellChecker/str
   !--
   Load tokens from the following field for spell checking,
   analyzer for the field's type as defined in schema.xml are used
   --
 *  str name=fieldspellcheck/str*
   str name=spellcheckIndexDir./spellchecker/str
   float name=thresholdTokenFrequency.0001/float
 /lst
   /searchComponent

 with the corresponding field type for spellcheck:

 fieldType name=text_spell class=solr.TextField
 positionIncrementGap=100 omitNorms=true
   analyzer type=index
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory
 synonyms=moto_synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=lang/stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StandardFilterFactory/
   /analyzer
 /fieldType

 and field:

 !-- spellcheck field is multivalued because it has the title and
 markup
   fields copied into it --
 field name=spellcheck type=text_spell stored=false
 omitTermFreqAndPositions=true multiValued=true/

 values from a markup and title field are copied into the spellcheck field.

 My /select search component has the following defaults:

 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dfmarkup_texts title_texts/str

   !-- Spell checking defaults --
   str name=spellchecktrue/str
   str name=spellcheck.collateExtendedResultstrue/str
   str name=spellcheck.extendedResultstrue/str
   str name=spellcheck.maxCollations2/str
   str name=spellcheck.maxCollationTries5/str
   str name=spellcheck.count5/str
   str name=spellcheck.collatetrue/str

   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.alternativeTermCount5/str

  /lst


 When I issue a search like this:


 http://localhost:8981/solr/articles/select?indent=truespellcheck.q=markup_texts:(Perfrm%20HVC)q=Perfrm%20HVCrows=0

 I get collations:

 lst name=collation
 str name=collationQuerymarkup_texts:(perform hvac)/str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperform/str
 str name=hvchvac/str
 /lst
 /lst
 lst name=collation
 str name=collationQuerymarkup_texts:(performed hvac)/str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperformed/str
 str name=hvchvac/str
 /lst
 /lst

 However, if I remove the spellcheck.q parameter I do not, i.e. no
 collations are returned for the following:


 http

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Hi James,

If I try:

http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0

I get the same result:

response
lst name=responseHeader
int name=status0/int
int name=QTime7/int
lst name=params
str name=indenttrue/str
str name=qPerfrm HVC/str
str name=maxCollationTries0/str
str name=rows0/str
/lst
/lst
result name=response numFound=0 start=0/result
lst name=spellcheck
lst name=suggestions
lst name=perfrm
int name=numFound3/int
int name=startOffset0/int
int name=endOffset6/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordperform/str
int name=freq4/int
/lst
lst
str name=wordperformed/str
int name=freq1/int
/lst
lst
str name=wordperformance/str
int name=freq3/int
/lst
/arr
/lst
lst name=hvc
int name=numFound2/int
int name=startOffset7/int
int name=endOffset10/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordhvac/str
int name=freq4/int
/lst
lst
str name=wordhave/str
int name=freq5/int
/lst
/arr
/lst
bool name=correctlySpelledfalse/bool
/lst
/lst
/response

However, you're right that my df field for the /select handler is in fact:

 str name=dfmarkup_texts title_texts/str

I would note that if I specify the query as follows:

http://localhost:8981/solr/articles/select?indent=trueq=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)rows=0maxCollationTries=0

which is what I thought specifying a df would effectively do, I get
collation results:

lst name=collation
str name=collationQuery
markup_texts:(perform hvac) OR title_texts:(perform hvac)
/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperform/str
str name=hvchvac/str
str name=perfrmperform/str
str name=hvchvac/str
/lst
/lst
lst name=collation
str name=collationQuery
markup_texts:(perform hvac) OR title_texts:(performed hvac)
/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperform/str
str name=hvchvac/str
str name=perfrmperformed/str
str name=hvchvac/str
/lst
/lst

I think I'm confused about the relationship between the q parameter and
what the field and queryAnalyzerFieldType are for in the spellcheck
component definition, i.e. what is this for:

   str name=fieldspellcheck/str

is it even needed if I've specified how the spelling index terms should
analyzed with:

   str name=queryAnalyzerFieldTypetext_spell/str

Thanks again
Brendan





On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 Try tacking maxCollationTries=0 to the URL and see if the collation
 returns.

 If you get a collation, then try the same URL with the collation as the
 q parameter.  Does that get results?

 My suspicion here is that you are assuming that markup_texts is the
 default search field for /select but in fact it isn't.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
 Sent: Tuesday, July 23, 2013 2:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck field element and collation issues

 Hi James,

 I get the following response for that query:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime8/int
 lst name=params
 str name=indenttrue/str
 str name=qPerfrm HVC/str
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=0 start=0/result
 lst name=spellcheck
 lst name=suggestions
 lst name=perfrm
 int name=numFound3/int
 int name=startOffset0/int
 int name=endOffset6/int
 int name=origFreq0/int
 arr name=suggestion
 lst
 str name=wordperform/str
 int name=freq4/int
 /lst
 lst
 str name=wordperformed/str
 int name=freq1/int
 /lst
 lst
 str name=wordperformance/str
 int name=freq3/int
 /lst
 /arr
 /lst
 lst name=hvc
 int name=numFound2/int
 int name=startOffset7/int
 int name=endOffset10/int
 int name=origFreq0/int
 arr name=suggestion
 lst
 str name=wordhvac/str
 int name=freq4/int
 /lst
 lst
 str name=wordhave/str
 int name=freq5/int
 /lst
 /arr
 /lst
 bool name=correctlySpelledfalse/bool
 /lst
 /lst
 /response

 Thanks
 Brendan


 On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  For this query:
 
 
 
 http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0
 
  ...do you get anything back in the spellcheck response?  Is it correcting
  the individual words and not giving collations?  Or are you getting no
  individual word suggestions also?
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
  Sent: Tuesday, July 23, 2013 1:47 PM
  To: solr-user@lucene.apache.org
  Subject: Spellcheck field element and collation issues
 
  Hi All,
 
  I have an IndexBasedSpellChecker component configured as follows (note
 the
  field parameter is set to the spellcheck field):
 
searchComponent name=spellcheck class=solr.SpellCheckComponent
 
  str name=queryAnalyzerFieldTypetext_spell/str
 
  lst name=spellchecker
str name=namedefault

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Thanks James. That's it! Now:

http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0

returns:

lst name=collation
str name=collationQueryperform hvac/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperform/str
str name=hvchvac/str
/lst
/lst
lst name=collation
str name=collationQueryperformed hvac/str
int name=hits4/int
lst name=misspellingsAndCorrections
str name=perfrmperformed/str
str name=hvchvac/str
/lst
/lst

If you have time, I'm still slightly unclear on the field element in the
spellcheck configuration. Maybe I should explain how I think it works:

1. You create a relatively unanalyzed field type (e.g. no stemming)
2. You copy text you want to be used to build the spellcheck index into
that field.
3. Build the spellcheck sidecar index (or noop if using DirectSpellChecker
in which case I assume it still uses the dedicated spellcheck field text
was copied into).

When executing a spellcheck request, solr uses the analyzer specified in
queryAnalyzerFieldType to tokenize the query passed in via the q or
spellcheck.q parameter and this tokenized text is the input the
spellcheckchecking instance.

Does that sound right?

Thanks
Brendan







On Tue, Jul 23, 2013 at 5:15 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 I don't believe you can specify more than 1 field on df (default field).
  What you want, I think, is qf (query fields), which is available only if
 using dismax/edismax.

 http://wiki.apache.org/solr/SearchHandler#df
 http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
 Sent: Tuesday, July 23, 2013 3:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck field element and collation issues

 Hi James,

 If I try:


 http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0

 I get the same result:

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime7/int
 lst name=params
 str name=indenttrue/str
 str name=qPerfrm HVC/str
 str name=maxCollationTries0/str
 str name=rows0/str
 /lst
 /lst
 result name=response numFound=0 start=0/result
 lst name=spellcheck
 lst name=suggestions
 lst name=perfrm
 int name=numFound3/int
 int name=startOffset0/int
 int name=endOffset6/int
 int name=origFreq0/int
 arr name=suggestion
 lst
 str name=wordperform/str
 int name=freq4/int
 /lst
 lst
 str name=wordperformed/str
 int name=freq1/int
 /lst
 lst
 str name=wordperformance/str
 int name=freq3/int
 /lst
 /arr
 /lst
 lst name=hvc
 int name=numFound2/int
 int name=startOffset7/int
 int name=endOffset10/int
 int name=origFreq0/int
 arr name=suggestion
 lst
 str name=wordhvac/str
 int name=freq4/int
 /lst
 lst
 str name=wordhave/str
 int name=freq5/int
 /lst
 /arr
 /lst
 bool name=correctlySpelledfalse/bool
 /lst
 /lst
 /response

 However, you're right that my df field for the /select handler is in fact:

  str name=dfmarkup_texts title_texts/str

 I would note that if I specify the query as follows:


 http://localhost:8981/solr/articles/select?indent=trueq=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)rows=0maxCollationTries=0

 which is what I thought specifying a df would effectively do, I get
 collation results:

 lst name=collation
 str name=collationQuery
 markup_texts:(perform hvac) OR title_texts:(perform hvac)
 /str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperform/str
 str name=hvchvac/str
 str name=perfrmperform/str
 str name=hvchvac/str
 /lst
 /lst
 lst name=collation
 str name=collationQuery
 markup_texts:(perform hvac) OR title_texts:(performed hvac)
 /str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperform/str
 str name=hvchvac/str
 str name=perfrmperformed/str
 str name=hvchvac/str
 /lst
 /lst

 I think I'm confused about the relationship between the q parameter and
 what the field and queryAnalyzerFieldType are for in the spellcheck
 component definition, i.e. what is this for:

str name=fieldspellcheck/str

 is it even needed if I've specified how the spelling index terms should
 analyzed with:

str name=queryAnalyzerFieldTypetext_spell/str

 Thanks again
 Brendan





 On Tue, Jul 23, 2013 at 3:58 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  Try tacking maxCollationTries=0 to the URL and see if the collation
  returns.
 
  If you get a collation, then try the same URL with the collation as the
  q parameter.  Does that get results?
 
  My suspicion here is that you are assuming that markup_texts is the
  default search field for /select but in fact it isn't.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
  Sent: Tuesday, July 23, 2013 2:43 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Spellcheck field element

Re: Spellcheck field element and collation issues

2013-07-23 Thread Brendan Grainger
Perfect thanks so much. You just cleared up the other little bit, i.e. when
the SpellingQueryConverter is used/not used and why you might implement
your own.

Thanks again.


On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 You've got it.  The only other thing is that spellcheck.q does not
 analyze anything.  The whole purpose of this is to allow you to just send
 raw keywords to be spellchecked.  This is handy if you have a complex q
 parameter (say, you're using local params, etc) and the
 SpellingQueryConverter cannot handle it.  You could write your own Query
 COnverter but its often just easier to strip out the keywords and send them
 over with spellcheck.q.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
 Sent: Tuesday, July 23, 2013 4:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck field element and collation issues

 Thanks James. That's it! Now:


 http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0

 returns:

 lst name=collation
 str name=collationQueryperform hvac/str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperform/str
 str name=hvchvac/str
 /lst
 /lst
 lst name=collation
 str name=collationQueryperformed hvac/str
 int name=hits4/int
 lst name=misspellingsAndCorrections
 str name=perfrmperformed/str
 str name=hvchvac/str
 /lst
 /lst

 If you have time, I'm still slightly unclear on the field element in the
 spellcheck configuration. Maybe I should explain how I think it works:

 1. You create a relatively unanalyzed field type (e.g. no stemming)
 2. You copy text you want to be used to build the spellcheck index into
 that field.
 3. Build the spellcheck sidecar index (or noop if using DirectSpellChecker
 in which case I assume it still uses the dedicated spellcheck field text
 was copied into).

 When executing a spellcheck request, solr uses the analyzer specified in
 queryAnalyzerFieldType to tokenize the query passed in via the q or
 spellcheck.q parameter and this tokenized text is the input the
 spellcheckchecking instance.

 Does that sound right?

 Thanks
 Brendan







 On Tue, Jul 23, 2013 at 5:15 PM, Dyer, James
 james.d...@ingramcontent.comwrote:

  I don't believe you can specify more than 1 field on df (default
 field).
   What you want, I think, is qf (query fields), which is available only
 if
  using dismax/edismax.
 
  http://wiki.apache.org/solr/SearchHandler#df
  http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Brendan Grainger [mailto:brendan.grain...@gmail.com]
  Sent: Tuesday, July 23, 2013 3:22 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Spellcheck field element and collation issues
 
  Hi James,
 
  If I try:
 
 
 
 http://localhost:8981/solr/articles/select?indent=trueq=Perfrm%20HVCrows=0maxCollationTries=0
 
  I get the same result:
 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime7/int
  lst name=params
  str name=indenttrue/str
  str name=qPerfrm HVC/str
  str name=maxCollationTries0/str
  str name=rows0/str
  /lst
  /lst
  result name=response numFound=0 start=0/result
  lst name=spellcheck
  lst name=suggestions
  lst name=perfrm
  int name=numFound3/int
  int name=startOffset0/int
  int name=endOffset6/int
  int name=origFreq0/int
  arr name=suggestion
  lst
  str name=wordperform/str
  int name=freq4/int
  /lst
  lst
  str name=wordperformed/str
  int name=freq1/int
  /lst
  lst
  str name=wordperformance/str
  int name=freq3/int
  /lst
  /arr
  /lst
  lst name=hvc
  int name=numFound2/int
  int name=startOffset7/int
  int name=endOffset10/int
  int name=origFreq0/int
  arr name=suggestion
  lst
  str name=wordhvac/str
  int name=freq4/int
  /lst
  lst
  str name=wordhave/str
  int name=freq5/int
  /lst
  /arr
  /lst
  bool name=correctlySpelledfalse/bool
  /lst
  /lst
  /response
 
  However, you're right that my df field for the /select handler is in
 fact:
 
   str name=dfmarkup_texts title_texts/str
 
  I would note that if I specify the query as follows:
 
 
 
 http://localhost:8981/solr/articles/select?indent=trueq=markup_texts:(Perfrm%20HVC)+OR+title_texts:(Perfrm%20HVC)rows=0maxCollationTries=0
 
  which is what I thought specifying a df would effectively do, I get
  collation results:
 
  lst name=collation
  str name=collationQuery
  markup_texts:(perform hvac) OR title_texts:(perform hvac)
  /str
  int name=hits4/int
  lst name=misspellingsAndCorrections
  str name=perfrmperform/str
  str name=hvchvac/str
  str name=perfrmperform/str
  str name=hvchvac/str
  /lst
  /lst
  lst name=collation
  str name=collationQuery
  markup_texts:(perform hvac) OR title_texts:(performed hvac)
  /str
  int name=hits4/int
  lst name=misspellingsAndCorrections
  str name=perfrmperform/str
  str name

Config changes in solr.DirectSolrSpellCheck after index is built?

2013-07-16 Thread Brendan Grainger
Hi All,

Can you change the configuration of a spellchecker
using solr.DirectSolrSpellCheck after you've built an index? I know that
this spellchecker doesn't build and index off to the side like
the IndexBasedSpellChecker so I'm wondering what's happening internally to
create a spellchecking dictionary.

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com


Changes in DIrectSpellChecker configuration cause hang on startup

2013-07-15 Thread Brendan Grainger
Hi All,

I changed the name of the queryAnalyzerFieldType for my spellcheck
component and the corresponding field and now when solr starts up, it hangs
at this point:

5797 [searcherExecutor-4-thread-1] INFO  org.apache.solr.core.SolrCore  –
QuerySenderListener sending requests to
Searcher@153d12bfmain{StandardDirectoryReader(segments_k9p:127340
_1cz(4.3):C387286/120
_2u1(4.3):C405320/146 _4pl(4.3):C493017/136 _65a(4.3):C322122/160
_7ky(4.3):C312296/147 _936(4.3):C326967/135 _b9j(4.3):C474140/229
_cyy(4.3):C298811/88428 _124m(4.3):C622322/137649

My config for the spellcheckcomponent:

searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypemarkup/str

!-- Multiple Spell Checkers can be declared and used by this
 component
  --

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedefault/str
  str name=fieldmarkup_texts/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  !-- the spellcheck distance measure used, the default is the
internal levenshtein --
  str name=distanceMeasureinternal/str
  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- the maximum #edits we consider when enumerating terms: can be 1
or 2 --
  int name=maxEdits1/int
  !-- the minimum shared prefix when enumerating terms --
  int name=minPrefix1/int
  !-- maximum number of inspections per result. --
  int name=maxInspections5/int
  !-- minimum length of a query term to be considered for correction
--
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
  !-- uncomment this to require suggestions to occur in 1% of the
documents
  float name=thresholdTokenFrequency.01/float
  --
/lst

Has anyone got some insight?

Thanks


EmbeddedSolrServer for indexing

2013-07-03 Thread Brendan Grainger
Hi,

I'm experimenting with indexing using the EmbeddedSolrServer. Just to be
sure, as I understand it, I do not need a running instance of solr to use
this, it literally is a running instance of solr.

Given the above, how safe is it to use an EmbeddedSolrServer for indexing
an index that might be simultaneously used by a regular running instance of
solr with clients communicating with it using the usual http interface?

Thanks
Brendan


Re: EmbeddedSolrServer for indexing

2013-07-03 Thread Brendan Grainger
Awesome thanks. What about indexing in a different core then renaming it once 
its done?

Thanks 
Brendan

On Jul 3, 2013, at 6:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 7/3/2013 2:45 PM, Brendan Grainger wrote:
 I'm experimenting with indexing using the EmbeddedSolrServer. Just to be
 sure, as I understand it, I do not need a running instance of solr to use
 this, it literally is a running instance of solr.
 
 You are correct, EmbeddedSolrServer starts a complete copy of Solr.
 
 Given the above, how safe is it to use an EmbeddedSolrServer for indexing
 an index that might be simultaneously used by a regular running instance of
 solr with clients communicating with it using the usual http interface?
 
 This is a bad idea.  Due to the way that 3.x and earlier work, you might
 be able to get away with it if you have an old version, but bad things
 can happen.
 
 Version 4.x and later will try to obtain a write lock as soon as they
 start, making it very hard to use the same index with more than one
 instance of Solr.
 
 Thanks,
 Shawn
 


Re: The book: Solr 4.x Deep Dive - Early Access Release #1

2013-06-21 Thread Brendan Grainger
Hi Jack,

Just bought the book. One thing I'd love to see in the next edition, based
on your list of candidates above is:

- Autocomplete deep dive

I've been working implementing this recently and it's much more complex
than just dropping in the Suggester as per the wiki as I'm sure you know. A
discussion of using faceting as an alternative with shingles, egdengrams.
How to do spell correction (using FuzzySuggester for example), pros and
cons of all those methods would be awesome.

As an aside, I'm currently using the FuzzySuggester which is doing 90% of
what I want.

Thanks
Brendan



On Fri, Jun 21, 2013 at 3:19 PM, Jack Krupansky j...@basetechnology.comwrote:

 I'll work on that and see if I can put it on Lulu in the description.

 -- Jack Krupansky

 -Original Message- From: Alexandre Rafalovitch
 Sent: Friday, June 21, 2013 3:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: The book: Solr 4.x Deep Dive - Early Access Release #1


 On Fri, Jun 21, 2013 at 2:41 PM, Jack Krupansky j...@basetechnology.com
 wrote:

 Here are the topics that are NOT in the current early-access edition:


 Congratulations. Is there a full (top-level) table-of-content
 somewhere? Lulu's preview cuts off too early because the TOC is deeply
 nested.

 Regards,
   Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: 
 http://www.linkedin.com/in/**alexandrerafalovitchhttp://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)




-- 
Brendan Grainger
www.kuripai.com


Any way to have the suggest component be filter query aware?

2013-06-17 Thread Brendan Grainger
Hi All,

I expect the answer is no, but just to be sure I am wondering if there is
any way to make the suggest component (http://wiki.apache.org/solr/Suggester)
filter query aware, i.e. I'd like to have suggestions for a given context,
so say if I were searching in the book lucene in action suggestions would
be offered for terms that exist in that book not the entire index.

Otherwise I guess I should look at using EdgeNGramFilter?

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com


Re: Suggest and Filtering

2013-06-15 Thread Brendan Grainger
Hi Otis and Jorge,

I probably wasn't phrasing my question too well, but I think I was looking
for FuzzySuggest. Messing around with the configs found here seems to be
doing what I want:

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-phrasesuggest.xml

Thanks
Brendan


On Fri, Jun 14, 2013 at 11:50 AM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Hi Otis,

 Sorry was a bit tired when I wrote that. I think what I'd like is to be
 able spellcheck the suggestions. For example. If a user types in brayk (as
 opposed to brake) I'd still get the following suggestions say:

 brake line
 brake condition

 Does that make sense?

 Thanks
 Brendan



 On Thu, Jun 13, 2013 at 8:53 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi,

 I think you are talking about wanting instant search?

 See https://github.com/fergiemcdowall/solrstrap

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Jun 13, 2013 at 7:43 PM, Brendan Grainger
 brendan.grain...@gmail.com wrote:
  Hi Solr Guru's
 
  I am trying to implement auto suggest where solr would suggest several
  phrases that would return results as the user types in a query (as
 distinct
  from autocomplete). e.g. say the user starts typing 'br' and we have
  documents that contain brake pads and left disc brake, solr would
  suggest both of those phrases with brake pads first. I also want to
 only
  look at documents that match a given filter query. So say I have a
 bunch of
  documents for a toyota cressida that contain the bi-gram brake pads,
  while the documents for a honda accord don't have any brake pad
 articles.
  If the user is filtering on the honda accord I wouldn't want brake
 pads
  as a suggestion.
 
  Right now, I've played with the suggest component and using faceting.
 
  Any thoughts?
 
  Thanks
  Brendan
 
  --
  Brendan Grainger
  www.kuripai.com




 --
 Brendan Grainger
 www.kuripai.com




-- 
Brendan Grainger
www.kuripai.com


Re: Suggest and Filtering

2013-06-14 Thread Brendan Grainger
Hi Otis,

Sorry was a bit tired when I wrote that. I think what I'd like is to be
able spellcheck the suggestions. For example. If a user types in brayk (as
opposed to brake) I'd still get the following suggestions say:

brake line
brake condition

Does that make sense?

Thanks
Brendan



On Thu, Jun 13, 2013 at 8:53 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 I think you are talking about wanting instant search?

 See https://github.com/fergiemcdowall/solrstrap

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Jun 13, 2013 at 7:43 PM, Brendan Grainger
 brendan.grain...@gmail.com wrote:
  Hi Solr Guru's
 
  I am trying to implement auto suggest where solr would suggest several
  phrases that would return results as the user types in a query (as
 distinct
  from autocomplete). e.g. say the user starts typing 'br' and we have
  documents that contain brake pads and left disc brake, solr would
  suggest both of those phrases with brake pads first. I also want to
 only
  look at documents that match a given filter query. So say I have a bunch
 of
  documents for a toyota cressida that contain the bi-gram brake pads,
  while the documents for a honda accord don't have any brake pad articles.
  If the user is filtering on the honda accord I wouldn't want brake pads
  as a suggestion.
 
  Right now, I've played with the suggest component and using faceting.
 
  Any thoughts?
 
  Thanks
  Brendan
 
  --
  Brendan Grainger
  www.kuripai.com




-- 
Brendan Grainger
www.kuripai.com


Suggest and Filtering

2013-06-13 Thread Brendan Grainger
Hi Solr Guru's

I am trying to implement auto suggest where solr would suggest several
phrases that would return results as the user types in a query (as distinct
from autocomplete). e.g. say the user starts typing 'br' and we have
documents that contain brake pads and left disc brake, solr would
suggest both of those phrases with brake pads first. I also want to only
look at documents that match a given filter query. So say I have a bunch of
documents for a toyota cressida that contain the bi-gram brake pads,
while the documents for a honda accord don't have any brake pad articles.
If the user is filtering on the honda accord I wouldn't want brake pads
as a suggestion.

Right now, I've played with the suggest component and using faceting.

Any thoughts?

Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com


Re: Receiving unexpected Faceting results.

2013-06-05 Thread Brendan Grainger
Hi Dotan,

I think all you need to do is add:

facet.mincount=1

i.e.

select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
rows=0facet.mincount=1

Note that you can do it per field as well:

select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags
rows=0f.tags.facet.mincount=1

http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount




On Wed, Jun 5, 2013 at 8:27 AM, Dotan Cohen dotanco...@gmail.com wrote:

 Consider the following Solr query:
 select?q=*:*fq=tags:dotan-*facet=truefacet.field=tagsrows=0

 The 'tags' field is a multivalue field. I would expect the previous
 query to return only tags that begin with the string 'dotan-' such as:
 dotan-home
 dotan-work
 ...but not strings which do not begin with (or even contain) the
 string in question.

 However, I am getting these results:
 lst name=discoapi_tags
 int name=dotan-home14/int
 int name=dotan-work13/int
 int name=beer0/int
 int name=beatles0/int
 /lst

 It _may_ be that the 'beer' and 'beatles' tags were once attached to
 the same documents as are attached the 'dotan-home' and/or
 'dotan-work'. I've done a bit of experimenting on this Solr install,
 so I cannot be sure. However, considering that they are in fact 0
 results for those two, I would not expect them to show up at all, even
 if they ever were attached to (i.e. once a value in the multiValue
 field) any of the results that match the filter query.

 So, the questions are:
 1) How can I check if ever the multiValue fields for a particular
 document (given its uniqueKey id) ever contains a specific value.
 Alternatively, how can I see all the values that the document ever had
 for the field. I don't expect this to actually be possible, but I ask
 if it is, i.e. by examining certain aspects of the Solr index with a
 text editor.

 2) If those spurious results are appearing does that mean necessarily
 that those values for the multivalued field were in fact once in the
 multivalued field for documents matching the filter query? Thus, the
 answer to the previous question would be to simply run a query for the
 id of the document in question, and facet on the multivalued field
 with a large limit.

 3) How to have Solr return only those faceting values for the field
 that in fact begin with 'dotan-', even if a document has other tags
 such as 'beatles'?

 4) How to have Solr return only those faceting values which are larger
 than 0?

 Thank you!

 --
 Dotan Cohen

 http://gibberish.co.il
 http://what-is-what.com




-- 
Brendan Grainger
www.kuripai.com


Low Priority: Lucene Facets in Solr?

2013-05-22 Thread Brendan Grainger
Hi All,

Not really a pressing need for this at all, but having worked through a few
tutorials, I was wondering if there was any work being done to incorporate
Lucene Facets into solr:

http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html

Brendan


Re: Fast faceting over large number of distinct terms

2013-05-22 Thread Brendan Grainger
Hi David,

Out of interest, what are you trying to accomplish by faceting over the
story_text field? Is it generally the case that the story_text field will
contain values that are repeated or categorize your documents somehow?
 From your description: story_text is used to store free form text
obtained by crawling new papers and blogs, it doesn't seem that way, so
I'm not sure faceting is what you want in this situation.

Cheers,
Brendan


On Wed, May 22, 2013 at 9:49 PM, David Larochelle 
dlaroche...@cyber.law.harvard.edu wrote:

 I'm trying to quickly obtain cumulative word frequency counts over all
 documents matching a particular query.

 I'm running in Solr 4.3.0 on a machine with 16GB of ram. My index is 2.5 GB
 and has around ~350,000 documents.

 My schema includes the following fields:

 field name=id type=string indexed=true stored=true required=true
 multiValued=false /
 field name=media_id type=int indexed=true stored=true
 required=true multiValued=false /
 field name=story_text  type=text_general indexed=true stored=true
 termVectors=true termPositions=true termOffsets=true /


 story_text is used to store free form text obtained by crawling new papers
 and blogs.

 Running faceted searches with the fc or fcs methods fails with the error
 Too many values for UnInvertedField faceting on field story_text

 http://localhost:8983/solr/query?q=id:106714828_6621facet=truefacet.limit=10facet.pivot=publish_date,story_textrows=0facet.method=fcs

 Running faceted search with the 'enum' method succeeds but takes a very
 long time.

 http://localhost:8983/solr/query?q=includes:foobarfacet=truefacet.limit=100facet.pivot=media_id,includesfacet.method=enumrows=0
 
 http://localhost:8983/solr/query?q=includes:mccainfacet=truefacet.limit=100facet.pivot=media_id,includesfacet.method=enumrows=0
 

 The frustrating thing is even if the query only returns a few hundred
 documents, it still takes 10 minutes or longer to get the cumulative word
 count results.

 Eventually we're hoping to build a system that will return results in a few
 seconds and scale to hundreds of millions of documents.
 Is there anyway to get this level of performance out of Solr/Lucene?

 Thanks,

 David




-- 
Brendan Grainger
www.kuripai.com


Re: Low Priority: Lucene Facets in Solr?

2013-05-22 Thread Brendan Grainger
Thanks Jack, no urgency here. I'm unsure that it would even be
easy/beneficial to integrate into solr, but I'm definitely interested in
it.

Brendan


On Wed, May 22, 2013 at 7:00 PM, Jack Krupansky j...@basetechnology.comwrote:

 The topic has come up, but nobody has expressed a sense of urgency.

 It actually has a placeholder Jira:
 https://issues.apache.org/**jira/browse/SOLR-4774https://issues.apache.org/jira/browse/SOLR-4774

 Feel free to add your encouragement there.

 -- Jack Krupansky

 -Original Message- From: Brendan Grainger
 Sent: Wednesday, May 22, 2013 6:39 PM
 To: solr-user@lucene.apache.org
 Subject: Low Priority: Lucene Facets in Solr?


 Hi All,

 Not really a pressing need for this at all, but having worked through a few
 tutorials, I was wondering if there was any work being done to incorporate
 Lucene Facets into solr:

 http://lucene.apache.org/core/**4_3_0/facet/org/apache/lucene/**
 facet/doc-files/userguide.htmlhttp://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html

 Brendan




-- 
Brendan Grainger
www.kuripai.com


Re: Question on implementation for schema design - parsing path information into stored field

2013-05-20 Thread Brendan Grainger
Hi Cord,

I think you'd do it like this:

1. Add this to schema.xml

!--
  Example of using PathHierarchyTokenizerFactory at index time, so
  queries for paths match documents at that path, or in descendent paths
--
fieldType name=descendent_path class=solr.TextField
  analyzer type=index
  tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=/ /
  /analyzer
  analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory /
  /analyzer
/fieldType

field name=folders_facet type=descendent_path indexed=true
stored=true multiValued=true /

2. When you index add the 'folders' to the folders_facet field (or whatever
you want to call it).
3. Your query would look something like:

http://localhost:8982/solr/
core_name/select?facet=onfacet.field=folders_facetfacet.mincount=1

There is a good explanation here:
http://wiki.apache.org/solr/HierarchicalFaceting#PathHierarchyTokenizerFactory


Hope that helps.
Brendan






On Mon, May 20, 2013 at 4:18 PM, Cord Thomas cord.tho...@gmail.com wrote:

 Hello,

 I am submitting rich documents to a SOLR index via Solr Cell.   This is all
 working well.

 The documents are organized in meaningful folders.  I would like to capture
 the folder names in my index so that I can use the folder names to provide
 facets.

 I can pass the path data into the indexing process and would like to
 convert 2 paths deep into indexed and stored data - or copy field data.

 Say i have files in these folders:

 Financial
 Financial/Annual
 Financial/Audit
 Organizational
 Organizational/Offices
 Organizational/Staff

 I would like to then provide facets using these names.

 Can someone please guide me in the right direction on how I might
 accomplish this?

 Thank you

 Cord




-- 
Brendan Grainger
www.kuripai.com


Re: Facets referenced by key

2013-05-16 Thread Brendan Grainger
Thanks for the excellent clarification. I'll ask the sunspot guys about the 
localparams issue. I have a patch that would fix it

Thanks 
Brendan

On May 16, 2013, at 1:42 PM, Chris Hostetter hossman_luc...@fucit.org wrote:

 
 : I would then like to refer to these 'pseudo' field later in the request
 : string. I thought this would be how I'd do it:
 : 
 : f.my_facet_key.facet.prefix=a_given_prefix
...
 
 
 that syntax was proposed in SOLR-1351 and a patch was made available, but 
 it was never commited (it only supported a subset of faceting, needed more 
 tests, and had unclear behavior about how the defaults where picked if 
 you combined f.key.facet.foo + f.field.facet.foo + facet.foo)
 
 : I thought this would work, however it doesn't appear to. What does work is
 : if I define the prefix and mincount in the local params:
 : 
 : facet.field={!ex=dt key=my_facet_key 
 facet.prefix=a_given_prefix}the_facet_field
 
 Correct, SOLR-4717 added support to Solr 4.3 for specifying all of the 
 facet options as local params such that that syntax would work.  Given th 
 way the use of Solr and localparams have evolved over the years it was 
 considered a more natural and logical way to specify facet option on a per 
 field or per key basis.
 
 : Is this expected? I'm also using sunspot and they construct the queries
 : with keys as in my first example, i.e. facet.field={!ex=dt
 : key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix
 
 I can't comment on that ... i'm not sure why sunspot would assume that 
 behavior would work (unless someone looked at SOLR-1351 once upon a time 
 and assumed that would definitely be official at some point)
 
 -Hoss


Facets referenced by key

2013-05-14 Thread Brendan Grainger
Hi All,

I'm creating 2 distinct sets of facet results using a key local param, e.g.:

facet.field={!ex=dt key=my_facet_key}the_facet_fieldfacet.field={!ex=dt
key=some_other_facet_key}the_facet_field

I would then like to refer to these 'pseudo' field later in the request
string. I thought this would be how I'd do it:

f.my_facet_key.facet.prefix=a_given_prefix

and

f.my_other_facet_key.prefix=a_different_prefix

So the whole thing would look like (With as similar request string for the
my_other_facet):

facet.field={!ex=dt
key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix


I thought this would work, however it doesn't appear to. What does work is
if I define the prefix and mincount in the local params:

facet.field={!ex=dt key=my_facet_key
facet.prefix=a_given_prefix}the_facet_field

Is this expected? I'm also using sunspot and they construct the queries
with keys as in my first example, i.e. facet.field={!ex=dt
key=my_facet_key}the_facet_fieldf.my_facet_key.facet.prefix=a_given_prefix


Thanks
Brendan

-- 
Brendan Grainger
www.kuripai.com


Limiting number of facet results for path based facets

2013-05-10 Thread Brendan Grainger
Hi,

I'm using the PathHierarchyTokenizer like this:

fieldType name=descendent_path class=solr.TextField
  analyzer type=index
  tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=/ /
  /analyzer
  analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory /
  /analyzer
/fieldType

 to create path based facets.

A query like:

http://localhost:8982/solr/articles/select?facet=truestart=0q=*:*facet.field=my_path_based_facetwt=rubyindent=on

gives facets that look like:

'my_path_based_facet'=[
'root',4,
'root/books',4,
'root/books/scifi',4,
'root/books/scifi/fantasy',3,
'root/books/scifi/fantasy/general',3,
'root/books/scifi/horror',1,
'root/books/scifi/horror/gore',1,
'root/books/scifi/superhero',1,
'root/books/scifi/superhero/superman',1,
'root/books/scifi/superhero/superman/general',1]}

What I'm wondering if if there is an easy way for me to say I only want up
to level 2 say. So if root was level 0, the returned facets would be:

'root',4,
'root/books',4,
'root/books/scifi',4,


Also given my the description of what I want to achieve is using
PathHierachyTokenizer the correct approach?

Thanks
Brendan


Re: Limiting number of facet results for path based facets

2013-05-10 Thread Brendan Grainger
Actually it occurred to me that doing something like this might work:

if I have a category of root/books/scifi for a document I'd create the
following values for the facet:

  0/root
  1/root
  1/root/books
  2/root
  2/root/books
  2/root/books/scifi

Then to get level 1 I'd do:

http://localhost:8982/solr/articles/select?facet=truestart=0q=*:*facet.field=my_path_based_facetwt=rubyindent=on
facet.prefix=1/root

Is that what people do?

Thanks
Brendan



On Fri, May 10, 2013 at 6:50 PM, Brendan Grainger 
brendan.grain...@gmail.com wrote:

 Hi,

 I'm using the PathHierarchyTokenizer like this:

 fieldType name=descendent_path class=solr.TextField
   analyzer type=index
   tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=/
 /
   /analyzer
   analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory /
   /analyzer
 /fieldType

  to create path based facets.

 A query like:


 http://localhost:8982/solr/articles/select?facet=truestart=0q=*:*facet.field=my_path_based_facetwt=rubyindent=on

 gives facets that look like:

 'my_path_based_facet'=[
 'root',4,
 'root/books',4,
 'root/books/scifi',4,
 'root/books/scifi/fantasy',3,
 'root/books/scifi/fantasy/general',3,
 'root/books/scifi/horror',1,
 'root/books/scifi/horror/gore',1,
 'root/books/scifi/superhero',1,
 'root/books/scifi/superhero/superman',1,
 'root/books/scifi/superhero/superman/general',1]}

 What I'm wondering if if there is an easy way for me to say I only want up
 to level 2 say. So if root was level 0, the returned facets would be:

 'root',4,
 'root/books',4,
 'root/books/scifi',4,


 Also given my the description of what I want to achieve is using 
 PathHierachyTokenizer the correct approach?

 Thanks
 Brendan




-- 
Brendan Grainger
www.kuripai.com


Re: Solr admin stops working

2012-08-02 Thread Brendan Grainger
I assume you're backgrounding solr. Maybe you just need

disown %1


Brendan

On Aug 2, 2012, at 1:04 PM, Niall n...@neildoyle.com wrote:

 I've got Solr 3.6 up working with Jetty but the admin page is inaccessible
 and Solr appears to stop working when I terminate my SSH connection to the
 server after running start.jar. Am I missing a trick here: how do I keep it
 running?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-admin-stops-working-tp3998848.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Commit not working after delete

2012-07-19 Thread Brendan Grainger
You might be running into the same issue someone else had the other day:

https://issues.apache.org/jira/browse/SOLR-3432



On Jul 19, 2012, at 1:23 PM, Rohit wrote:

 We delete some data from solr, post which solr is not accepting any
 commit's. What could be wrong?
 
 
 
 We don't see any error in logs or anywhere else.
 
 
 
 Regards,
 
 Rohit
 
 
 



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Brendan Grainger
Hi Briggs,

I'm not sure about Solr 4.0, but do you need to commit?

 curl http://localhost:8983/solr/coupon/update?commit=true -H Content-Type: 
 text/xml --data-binary 'deletequery*:*/query/delete'


Brendan


www.kuripai.com

On Jul 18, 2012, at 7:11 PM, Briggs Thompson wrote:

 I have realized this is not specific to SolrJ but to my instance of Solr. 
 Using curl to delete by query is not working either. 
 
 Running 
 curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml 
 --data-binary 'deletequery*:*/query/delete'
 
 Yields this in the logs:
 INFO: [coupon] webapp=/solr path=/update 
 params={stream.body=deletequery*:*/query/delete} {deleteByQuery=*:*} 
 0 0
 
 But the corpus of documents in the core do not change. 
 
 My solrconfig is pretty barebones at this point, but I attached it in case 
 anyone sees something strange. Anyone have any idea why documents aren't 
 getting deleted?
 
 Thanks in advance,
 Briggs Thompson
 
 On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:
 Hello All,
 
 I am using 4.0 Alpha and running into an issue with indexing using 
 HttpSolrServer (SolrJ). 
 
 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());
 
 Relevant Solrconfig.xml content:
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/update/javabin 
 class=solr.BinaryUpdateRequestHandler /
 
 Indexing documents works perfectly fine (using addBeans()), however, when 
 trying to do deletes I am seeing issues. I tried to do a 
 solrServer.deleteByQuery(*:*) followed by a commit and optimize, and 
 nothing is deleted. 
 
 The response from delete request is a success, and even in the solr logs I 
 see the following:
 INFO: [coupon] webapp=/solr path=/update/javabin 
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start 
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
 
 
 I tried removing the binaryRequestWriter and have the request send out in 
 default format, and I get the following error. 
 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType: 
 application/octet-stream  Not in: [application/xml, text/csv, text/json, 
 application/csv, application/javabin, text/xml, application/json]
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
   at 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
   at 
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
   at 
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
   at java.lang.Thread.run(Thread.java:636)
 
 
 I thought that an optimize does the same thing as expungeDeletes, but in the 
 log I see expungeDeletes=false. Is there a way to force that using SolrJ?
 
 Thanks in advance,
 Briggs
 
 
 solrconfig.xml



Re: Looking for a good Text on Solr

2011-12-16 Thread Brendan Grainger
There is an update to that book for Solr 3:

http://www.packtpub.com/apache-solr-3-enterprise-search-server/book

I actually bought it recently, but haven't looked at it yet.

Good luck.
Brendan

On Dec 16, 2011, at 9:01 PM, Shiv Deepak wrote:

 I am looking for a good book to read from and get a better understanding of 
 solr.
 
 On amazon, all the books on Solr have average rating (which I supposed no one 
 tried them or bothered to post a review) but this one: Solr 1.4 Enterprise 
 Search Server by David Smiley, Eric Pugh has a pretty decent review. But the 
 current version of Solr is 3.5, so should I proceed with David Smiley's book 
 or is there a better text available.
 
 Thanks,
 Shiv Deepak



Implications of setting catenateAll=1

2011-11-17 Thread Brendan Grainger
Hi,

The default for catenateAll is 0 which we've been using on the 
WordDelimiterFilter. What would be the possibly negative implications of 
setting this to 1? So that:

wi-fi-800

would produce the tokens:

wi, fi, wifi, 800, wifi800

for example? 

Thanks

Anyway to stop an optimize?

2011-11-09 Thread Brendan Grainger
Hi,

Does anyone know if an optimize can be stopped once started?

Thanks


Re: Anyway to stop an optimize?

2011-11-09 Thread Brendan Grainger
I think in the past I've tried that, and it has restarted, although I will have 
to try it out (this time we were loath to stop it as we didn't want any index 
corruption issues). 

A related question is, why did the optimize start? I thought it had to be 
explicitly started, but somehow it started optimzing on it's own.

Thanks again
Brendan

On Nov 9, 2011, at 10:44 PM, Walter Underwood wrote:

 If you restart the server, the optimize should stop and not restart, right?
 
 wunder
 
 On Nov 9, 2011, at 7:43 PM, Otis Gospodnetic wrote:
 
 Don't think so, at least not gracefully.  You can always do partial optimize 
 and do a few of them if you want to optimize in smaller steps.
 
 Otis
 
 
 
 From: Brendan Grainger brendan.grain...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, November 9, 2011 4:35 PM
 Subject: Anyway to stop an optimize?
 
 Hi,
 
 Does anyone know if an optimize can be stopped once started?
 
 Thanks
 
 
 
 
 
 



Re: Doing Shingle but also keep special single word

2010-08-20 Thread Brendan Grainger
Hi Scott,

Is there a reason why you wouldn't just index these special words into another 
field and then search over both fields? That would also have the nice property 
of being able to boost on the special word field if you wanted.

HTH
Brendan

On Aug 20, 2010, at 6:19 AM, scott chu (朱炎詹) wrote:

 I am building index with Shingle filter. We know it's minimum 2-gram but I 
 also want keep some special single word, e.g. IBM, Microsoft, etc. i.e. I 
 want to do a minimum 2-gram but also want to have these single word in my 
 index, Is it possible?
 
 Scott



Re: preside != president

2010-06-28 Thread Brendan Grainger
Hi Darren,

You might want to look at the KStemmer 
(http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) instead of 
the standard PorterStemmer. It essentially has a 'dictionary' of exception 
words where stemming stops if found, so in your case president won't be stemmed 
any further than president (but presidents will be stemmed to president). You 
will have to integrate it into solr yourself, but that's straightforward. 

HTH
Brendan

 
On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote:

 Hi,
  It seems to me that because the stemming does not produce
 grammatically correct stems in many of the cases,
 search anomalies can occur like the one I am seeing where I have a
 document with president in it and it is returned
 when I search for preside, a different word entirely.
 
 Is this correct or acceptable behavior? Previous discussions here on
 stemming, I was told its ok as long as all the words reduce
 to the same stem, but when different words reduce to the same stem it
 seems to affect search results in a bad way.
 
 Darren



Re: Using Solr with CouchDB

2010-04-28 Thread Brendan Grainger
Hi Patrick,

I don't know much about couch, but if you to return json from solr (which I 
think couch would understand) you can do that with wt=json in the query string 
when querying solr. See here for more details: 
http://wiki.apache.org/solr/SolJSON

HTH a little
Brendan

On Apr 28, 2010, at 11:27 AM, Patrick Petermair wrote:

 Hi!
 
 I'm currently trying to implement a full text search for CouchDB using Solr. 
 I went through the tutorial and also some of the examples (slashdot rss feed 
 import, hsql import,..) within the downloadable distribution.
 
 Since CouchDB works with REST + plaintext JSON and Solr is looking for sql 
 queries / xmls (as far as I could gather from the examples), I'm wondering if 
 I'm using the right tools for the job. Has anyone already implemented a 
 search for CouchDB with Solr? Any tutorials, links or sample configs that 
 could help me?
 
 Thanks,
 Patrick



Re: Is there any other tool other than DIH to index a database

2010-04-08 Thread Brendan Grainger
For what it's worth, it's also really easy to implement your own 
EntityProcessor. Extend from EntityProcessorBase then implement the getNext 
method to return a MapString, Object representing the row you want indexed. I 
did exactly this so I could use reuse my hibernate domain models to query for 
the data instead of sql.

Brendan

On Apr 8, 2010, at 9:17 AM, Shawn Heisey wrote:

 On 4/7/2010 9:26 PM, bbarani wrote:
 Hi,
 
 I am currently using DIH to index the data from a database. I am just trying
 to figure out if there are any other open source tools which I can use just
 for indexing purpose and use SOLR for querying.
 
 I also thought of writing a custom code for retrieving the data from
 database and use SOLRJ to add the data as documents in to lucene. One doubt
 here is that if I use the custom code for retrieving the data and use SOLRJ
 to commit that data, will the schema file be still used? I mean the field
 types / analyzers / tokenizers etc.. present in schema file? or do I need to
 manipulate each data (to fit to corresponding data type) in my SOLRJ
 program?
 
   
 
 This response is more of an answer to your earlier message where you asked 
 about batch importing than this exact question, but this is where the 
 discussion is, so I'm answering here.  You could continue to use DIH and 
 specify the batches externally.  I just actually wrote most of this in reply 
 to another email just a few minutes ago.
 
 You can pass variables into the DIH to specify the range of documents that 
 you want to work on, and handle the batching externally.  Start with a 
 full-import or a delete/optimize to clear out the index and then do multiple 
 delta-imports.
 
 Here's what I'm using as the queries in my latest iteration.  The 
 deltaImportQuery is identical to the regular query used for full-import.  The 
 deltaQuery is just something related that returns quickly, the information is 
 thrown away when it does a delta-import.
 
 query=SELECT * FROM ${dataimporter.request.dataTable} WHERE did gt; 
 ${dataimporter.request.minDid} AND did lt;= ${dataimporter.request.maxDid} 
 AND (did % ${dataimporter.request.numShards}) IN 
 (${dataimporter.request.modVal})
 
 deltaQuery=SELECT MAX(did) FROM ${dataimporter.request.dataTable}
 
 deltaImportQuery=SELECT * FROM ${dataimporter.request.dataTable} WHERE did 
 gt; ${dataimporter.request.minDid} AND did lt;= 
 ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) 
 IN (${dataimporter.request.modVal})
 
 Then here is my URL template:
 
 http://HOST:PORT/solr/CORE/dataimport?command=COMMANDdataTable=DATATABLEnumShards=NUMSHARDSmodVal=MODVALminDid=MINDIDmaxDid=MAXDID
 
 And the perl data structure that holds the replacements for the uppercase 
 parts:
 
 $urlBits = {
  HOST = $cfg{'shards/inc.host1'},
  PORT = $cfg{'shards/inc.port'},
  MODVAL = $cfg{'shards/inc.modVal'},
  CORE = live,
  COMMAND = delta-importcommit=trueoptimize=false,
  DATATABLE = $cfg{dataTable},
  NUMSHARDS = $cfg{numShards},
  MINDID = $cfg{maxDid},
  MAXDID = $dbMaxDid,
 };
 
 Good luck with your setup!
 
 Shawn
 



Re: Destemming snafu

2009-06-18 Thread Brendan Grainger
Are you using Porter Stemming? If so I think you can just specify your  
word in the protwords.txt file (or whatever you've called it).


Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters  
and the example config for the Porter Stemmer:

fieldtype name=myfieldtype class=solr.TextField
	 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/  
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt / /analyzer

 /fieldtype

HTH
Brendan

On Jun 18, 2009, at 4:38 PM, Stephen Weiss wrote:


Hi,

I've hit a bit of a problem with destemming and could use some advice.

Right now there is a word in the index called Stylesight and  
another word Stylesightings, which was just added.  When users  
search for Stylesightings, the client really only wants them to  
get results that match Stylesightings and not Stylesight, as  
they are two [relatively] unrelated things.  However, I'm guessing  
because of the destemmer, Stylesightings becomes Stylesight  
internally... which results in the wrong behavior.


I really don't want to turn off the destemmer, that's like killing  
an ant with a nuke.  I was thinking, perhaps, since we use both  
index- and query-time synonyms, I could make a synonym like this:


Stylesightings =  xlkje0r923jjfsdf

or some other random string of un-destemmable junk, that might work,  
but I'm not sure and reindexing all the affected documents will take  
quite some time so it would be good to know in advance if this is  
even a good idea.


Of course, if there's another, better idea, I'd be very open to that  
too.


Thanks for any suggestions!

--
Steve




Re: [VOTE] Community Logo Preferences

2008-11-25 Thread Brendan Grainger

https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394353/solr.s5.jpg

On Nov 25, 2008, at 9:05 AM, Marcus Stratmann wrote:


https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12393936/logo_remake.jpg
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394353/solr.s5.jpg




Re: Any idea ??? I'm lost .... Thanks

2008-10-01 Thread Brendan Grainger

Hi,

I think:

Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/'

is a major clue no? Do you actually have a solrconfig.xml and how are  
you starting solr?


Regards
Brendan

On Oct 1, 2008, at 11:11 AM, sunnyfr wrote:



Oct  1 16:45:10 solr-test jsvc.exec[23757]: eaa main
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init() done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrUpdateServlet init INFO:
SolrUpdateServlet.init() done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying web
application archive apache-solr-1.3-dev.war
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init INFO:
SolrDispatchFilter.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: solr  
home

defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter initMultiCore INFO:  
looking for

multicore.xml: /solr/multicore.xml
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: solr  
home

defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to
'solr/'
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader createClassLoader INFO:  
Reusing

parent classloader
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not  
start

SOLR. Check solr/home property java.lang.RuntimeException: Can't find
resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/ ^Iat
org 
.apache 
.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java: 
168)

^Iat
org 
.apache 
.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:136)

^Iat org.apache.solr.core.Config.init(Config.java:97) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:108) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:65) ^Iat
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
89)

^Iat
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 
221)

^Iat
org 
.apache 
.catalina 
.core 
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 
302)

^Iat
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78)

^Iat org.apache.catalina.core

--
View this message in context: 
http://www.nabble.com/Any-idea-I%27m-lost--Thanks-tp19762598p19762598.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Any idea ??? I'm lost .... Thanks

2008-10-01 Thread Brendan Grainger

Hi Sunny,

Sorry, I've not use multicores with tomcat yet. However, I seem to  
remember that multicore.xml changed it's name to solr.xml. I take it  
you're using solr 1.3 or are you using a nightly build of some sort?


Brendan

On Oct 1, 2008, at 11:46 AM, sunnyfr wrote:



Otherwise I've my solr.war in my foler /data/solr/
I've no idea anymore ... Any idea Brendan?


sunnyfr wrote:


I have solrconfig.xml in my folder /data/solr/books/conf/
and I've multicore.xml in /data/solr/
?xml version=1.0 encoding=UTF-8 ?
multicore adminPath=/admin/multicore persistent=true  
sharedLib=lib



 core name=video instanceDir=books default=true /
 core name=group instanceDir=group /
 core name=user instanceDir=user /
/multicore

otherwise I've my solr.xml in /etc/tomcat5.5/Catalina/localhost/ 
solr.xml

Context path=/solr docBase=/data/solr/solr.war debug=0
crossContext=true 
  Environment name=solr/home type=java.lang.String
value=/data/solr override=true /
/Context

and then I start tomcat5.5 ... do I miss something ?


Brendan Grainger-2 wrote:


Hi,

I think:

Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/'

is a major clue no? Do you actually have a solrconfig.xml and how  
are

you starting solr?

Regards
Brendan

On Oct 1, 2008, at 11:11 AM, sunnyfr wrote:



Oct  1 16:45:10 solr-test jsvc.exec[23757]: eaa main
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()  
done

Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrUpdateServlet init INFO:
SolrUpdateServlet.init() done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying  
web

application archive apache-solr-1.3-dev.war
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init INFO:
SolrDispatchFilter.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
solr

home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter initMultiCore INFO:
looking for
multicore.xml: /solr/multicore.xml
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
solr

home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader init INFO: Solr home  
set to

'solr/'
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader createClassLoader INFO:
Reusing
parent classloader
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not
start
SOLR. Check solr/home property java.lang.RuntimeException: Can't  
find

resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/ ^Iat
org
.apache
.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:
168)
^Iat
org
.apache
.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java: 
136)

^Iat org.apache.solr.core.Config.init(Config.java:97) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:108) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:65) ^Iat
org
.apache 
.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:

89)
^Iat
org
.apache
.catalina
.core 
.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:

221)
^Iat
org
.apache
.catalina
.core
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:
302)
^Iat
org
.apache
.catalina
.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java: 
78)

^Iat org.apache.catalina.core

--
View this message in context:
http://www.nabble.com/Any-idea-I%27m-lost--Thanks-tp19762598p19762598.html
Sent from the Solr - User mailing list archive at Nabble.com.










--
View this message in context: 
http://www.nabble.com/Any-idea-I%27m-lost--Thanks-tp19762598p19763325.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Any idea ??? I'm lost .... Thanks

2008-10-01 Thread Brendan Grainger
Where is this log from? Not from your original stacktrace clearly. So  
are you getting a different error now? Also, which version of solr are  
you using, 1.3 or a nightly build?



On Oct 1, 2008, at 12:07 PM, sunnyfr wrote:



It's wierd because earlier in the code it looks like it does found  
it :

Oct  1 18:00:36 solr-test jsvc.exec[5550]: Oct 1, 2008 6:00:36 PM
org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to
'/data/solr/group/'
Oct  1 18:00:36 solr-test jsvc.exec[5550]: Oct 1, 2008 6:00:36 PM
org.apache.solr.core.SolrResourceLoader createClassLoader INFO:  
Reusing

parent classloader
Oct  1 18:00:36 solr-test jsvc.exec[5550]: Oct 1, 2008 6:00:36 PM
org.apache.solr.core.SolrConfig init INFO: Loaded SolrConfig:
solrconfig.xml
Oct  1 18:00:36 solr-test jsvc.exec[5550]: Oct 1, 2008 6:00:36 PM
org.apache.solr.schema.IndexSchema readSchema INFO: Reading Solr  
Schema

Oct  1 18:00:36 solr-test jsvc.exec[5550]: Oct 1, 2008 6:00:36 PM
org.apache.solr.schema.IndexSchema readSchema INFO: Schema name=videos

its just after when it try to deploy ??




sunnyfr wrote:


I have solrconfig.xml in my folder /data/solr/books/conf/
and I've multicore.xml in /data/solr/
?xml version=1.0 encoding=UTF-8 ?
multicore adminPath=/admin/multicore persistent=true  
sharedLib=lib



 core name=video instanceDir=books default=true /
 core name=group instanceDir=group /
 core name=user instanceDir=user /
/multicore

otherwise I've my solr.xml in /etc/tomcat5.5/Catalina/localhost/ 
solr.xml

Context path=/solr docBase=/data/solr/solr.war debug=0
crossContext=true 
  Environment name=solr/home type=java.lang.String
value=/data/solr override=true /
/Context

and then I start tomcat5.5 ... do I miss something ?


Brendan Grainger-2 wrote:


Hi,

I think:

Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/'

is a major clue no? Do you actually have a solrconfig.xml and how  
are

you starting solr?

Regards
Brendan

On Oct 1, 2008, at 11:11 AM, sunnyfr wrote:



Oct  1 16:45:10 solr-test jsvc.exec[23757]: eaa main
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()  
done

Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrUpdateServlet init INFO:
SolrUpdateServlet.init() done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying  
web

application archive apache-solr-1.3-dev.war
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init INFO:
SolrDispatchFilter.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
solr

home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter initMultiCore INFO:
looking for
multicore.xml: /solr/multicore.xml
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO: No
/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
solr

home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader init INFO: Solr home  
set to

'solr/'
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.core.SolrResourceLoader createClassLoader INFO:
Reusing
parent classloader
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10 PM
org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not
start
SOLR. Check solr/home property java.lang.RuntimeException: Can't  
find

resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/ ^Iat
org
.apache
.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:
168)
^Iat
org
.apache
.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java: 
136)

^Iat org.apache.solr.core.Config.init(Config.java:97) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:108) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:65) ^Iat
org
.apache 
.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:

89)
^Iat
org
.apache
.catalina
.core 
.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:

221)
^Iat
org
.apache
.catalina
.core
.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:
302)
^Iat
org
.apache
.catalina

Re: Any idea ??? I'm lost .... Thanks

2008-10-01 Thread Brendan Grainger

Sorry Sunny,

Will have to punt on this one. If I were you I'd try using 1.3. To be  
honest, if I remember correctly, 1.2 didn't have multicore support.


Regards
Brendan


On Oct 1, 2008, at 12:20 PM, sunnyfr wrote:



Thanks Brendan,

I use solr 1.2 ... I will update to solr 1.3 soon .. I tried to  
rename it

... but still .
help i need somebody .. heppp LOL

Thanks Brendan



Brendan Grainger-2 wrote:


Hi Sunny,

Sorry, I've not use multicores with tomcat yet. However, I seem to
remember that multicore.xml changed it's name to solr.xml. I take it
you're using solr 1.3 or are you using a nightly build of some sort?

Brendan

On Oct 1, 2008, at 11:46 AM, sunnyfr wrote:



Otherwise I've my solr.war in my foler /data/solr/
I've no idea anymore ... Any idea Brendan?


sunnyfr wrote:


I have solrconfig.xml in my folder /data/solr/books/conf/
and I've multicore.xml in /data/solr/
?xml version=1.0 encoding=UTF-8 ?
multicore adminPath=/admin/multicore persistent=true
sharedLib=lib



core name=video instanceDir=books default=true /
core name=group instanceDir=group /
core name=user instanceDir=user /
/multicore

otherwise I've my solr.xml in /etc/tomcat5.5/Catalina/localhost/
solr.xml
Context path=/solr docBase=/data/solr/solr.war debug=0
crossContext=true 
 Environment name=solr/home type=java.lang.String
value=/data/solr override=true /
/Context

and then I start tomcat5.5 ... do I miss something ?


Brendan Grainger-2 wrote:


Hi,

I think:

Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/'

is a major clue no? Do you actually have a solrconfig.xml and how
are
you starting solr?

Regards
Brendan

On Oct 1, 2008, at 11:11 AM, sunnyfr wrote:



Oct  1 16:45:10 solr-test jsvc.exec[23757]: eaa main
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrServlet init INFO: SolrServlet.init()
done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrUpdateServlet init INFO:
SolrUpdateServlet.init() done
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.catalina.startup.HostConfig deployWAR INFO: Deploying
web
application archive apache-solr-1.3-dev.war
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrDispatchFilter init INFO:
SolrDispatchFilter.init()
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
No

/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:
solr
home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrDispatchFilter initMultiCore INFO:
looking for
multicore.xml: /solr/multicore.xml
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM
org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:  
No

/solr/home in JNDI
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.core.SolrResourceLoader locateInstanceDir INFO:
solr
home
defaulted to 'solr/' (could not find system property or JNDI)
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.core.SolrResourceLoader init INFO: Solr home
set to
'solr/'
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.core.SolrResourceLoader createClassLoader INFO:
Reusing
parent classloader
Oct  1 16:45:10 solr-test jsvc.exec[23757]: Oct 1, 2008 4:45:10  
PM

org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not
start
SOLR. Check solr/home property java.lang.RuntimeException: Can't
find
resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/  
^Iat

org
.apache
.solr 
.core.SolrResourceLoader.openResource(SolrResourceLoader.java:

168)
^Iat
org
.apache
.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:
136)
^Iat org.apache.solr.core.Config.init(Config.java:97) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:108) ^Iat
org.apache.solr.core.SolrConfig.init(SolrConfig.java:65) ^Iat
org
.apache
.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
89)
^Iat
org
.apache
.catalina
.core
.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:
221)
^Iat
org
.apache
.catalina
.core
.ApplicationFilterConfig 
.setFilterDef(ApplicationFilterConfig.java:

302)
^Iat
org
.apache
.catalina
.core 
.ApplicationFilterConfig.init(ApplicationFilterConfig.java:

78)
^Iat org.apache.catalina.core

--
View this message in context:
http://www.nabble.com/Any-idea-I%27m-lost--Thanks-tp19762598p19762598.html
Sent from the Solr - User mailing list archive at Nabble.com.










--
View

Re: Word Gram?

2008-08-13 Thread Brendan Grainger

Hi Ryan,

We do basically the same thing, using a modified ShingleFilter (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html 
). I have it set up to build 'shingles' of size 2, 3, 4, 5 which I  
index into separate fields. If there is a better way of doing this  
sort of thing I'd love to know :-)


Brendan

On Aug 13, 2008, at 3:59 PM, Ryan McKinley wrote:

I'm looking for a way to get common word groups within documents.   
That is, what are the top two, three, ... n word groups within the  
index.


I was messing with indexing adjacent words together (sorry about the  
earlier commit)... is this a reasonable approach?  Any other ideas  
for pulling out common phrases?  Any simple post processing?


ryan




Re: Slight issue with classloading and DataImportHandler

2008-06-18 Thread Brendan Grainger

Hi,

I am actually providing the fully qualified classname in the  
configuration and I was still getting a ClassNotFoundException. If you  
look at the code in SolrResourceLoader they actually explicitly add  
the jars in solr-home/lib to the classloader:


static ClassLoader createClassLoader(File f, ClassLoader loader) {
if( loader == null ) {
  loader = Thread.currentThread().getContextClassLoader();
}
if (f.canRead()  f.isDirectory()) {
  File[] jarFiles = f.listFiles();
  URL[] jars = new URL[jarFiles.length];
  try {
for (int j = 0; j  jarFiles.length; j++) {
  jars[j] = jarFiles[j].toURI().toURL();
  log.info(Adding ' + jars[j].toString() + ' to Solr  
classloader);

}
return URLClassLoader.newInstance(jars, loader);
  } catch (MalformedURLException e) {
SolrException.log(log,Can't construct solr lib class  
loader, e);

  }
}
log.info(Reusing parent classloader);
return loader;
  }


This seems to be me to be why my class is now found when I include my  
utilities jar in solr-home/lib.


Thanks
Brendan

On Jun 18, 2008, at 11:49 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



hi,
DIH does not load class using the SolrResourceLoader. It tries a
Class.forName() with the name you provide if it fails it prepends
org.apache.solr.handler.dataimport. and retries.

This is true for not just transformers but also for Entityprocessor,
DataSource and Evaluator

The reason for doing so is that we do not use any of the 'solr.'
packages in DIH. All our implementations fall into the default package
and we can directly use them w/o the package name.

So , if you are writing your own implementations use the default
package or provide the fully qualified class name.

--Noble

On Thu, Jun 19, 2008 at 8:09 AM, Jon Baer [EMAIL PROTECTED] wrote:
Thanks.  Yeah took me a while to figure out I needed to do  
something like
transformer=com.mycompany.solr.MyTransformer on the entity before  
it would

work ...

- Jon

On Jun 18, 2008, at 1:51 PM, Brendan Grainger wrote:


Hi,

I set up the new DataimportHandler last night to replace some custom
import code I'd written and so far I'm loving it thank you.

I had one issue you might want to know about it. I have some solr
extensions I've written and packaged in a jar which I place in:

solr-home/lib

as per:


http://wiki.apache.org/solr/SolrPlugins#head-59e2685df65335e82f8936ed55d260842dc7a4dc

This works well for my handlers but a custom Transformer I wrote and
packaged the same way was throwing a ClassNotFoundException. I  
tracked it

down to the DocBuilder.loadClass method which was just doing a
Class.forName. Anyway, I fixed it for the moment by probably do  
something
stupid and creating a SolrResourceLoader (which I imagine could be  
an
instance variable, but at 3am I just wanted to get it working).  
Anyway, this

fixes the problem:

@SuppressWarnings(unchecked)
static Class loadClass(String name) throws ClassNotFoundException {
 SolrResourceLoader loader = new SolrResourceLoader( null );
 return loader.findClass(name);
  // return Class.forName(name);
}

Brendan







--
--Noble Paul




Re: Seeking Feedback: non back compat for Java API of 3 FilterFactories in 1.3?

2008-06-13 Thread Brendan Grainger
Same here. I took a look at the options you from the dev list and  
seems to me (3) user education should be fine.


Thanks for all the great work.
Brendan

On Jun 13, 2008, at 4:37 PM, Brian Johnson wrote:


FWIW - I have no problem with the change.

Thanks,

Brian


- Original Message 
From: Walter Underwood [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, June 13, 2008 11:38:27 AM
Subject: Re: Seeking Feedback: non back compat for Java API of 3  
FilterFactories in 1.3?


We use it out of the box. Our extensions are new filters or new
request handlers, all configured through the XML files.

wunder

On 6/13/08 11:15 AM, Chris Hostetter [EMAIL PROTECTED]  
wrote:




The Solr Developers would like some feedback from the user community
regarding some changes that have been made to StopFilterFactory,
SynonymFilterFactory, and EnglishProterFilterFactory since Solr 1.2  
which

breaks backwards compatibility in situations where client Java code
directly construction and initializes instances of these classes.

These changes do *NOT* affect Solr users who use Solr out of the  
box.


The only people who might possibly be impacted by these changes are  
users

who write custom Java code using the Solr APIs and directly construct
instances (instead of getting them from an IndexSchema object) using
code such as this
StopFilterFactory f = new StopFilterFactory()
f.init(new MapString,String());
// now do something with f

If this does not apply to you, you can safely ignore this thread.

If this does apply to you, please review SOLR-594 and the mailing  
list
threads linked to from that issue and let us know (either by  
replying to
this thread, or by posting a comment in the Jira issue) what you  
think
about the proposed solution -- Documenting that when upgrading to  
Solr

1.3, any custom code like this would need to be changed like so...
StopFilterFactory f = new StopFilterFactory()
f.init(new MapString,String());
 
f.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader());

// now do something with f

Of the options available, it is our belief that this is: 1) the  
simplest

approach; 2) benefits the majority of users automaticly; 3) adversely
affects the fewest number of people; 4) affects those people in a
relatively small way (requiring one new line of code).  But we do  
want
to verify that the number of people affected is in fact relatively  
small.


https://issues.apache.org/jira/browse/SOLR-594

Thanks.


-Hoss





Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi,

I've just changed the stemming algorithm slightly and am running a few  
tests against the old stemmer versus the new stemmer. I did a query  
for 'hanger' and using the old stemmer I get the following scoring for  
a document with the title: Converter Hanger Assembly Replacement


6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is different in each of the explanations, ie: the fieldNorm  
using the old stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454).  
For the new stemmer  0.4375 = fieldNorm(field=title_t, doc=3454). I  
ran the title through both stemmers and get the same number of tokens  
produced. I do no index time boosting on the title_t field. I am using  
DefaultSimilarity in both instances. So I figured the calculated  
fieldNorm would be:


field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5

I wouldn't have thought that changing the stemmer would have any  
impact on the fieldNorm in this case. Any insight? Please kick me over  
to the lucene list if you feel this isn't appropriate 

Re: CSV output

2008-06-11 Thread Brendan Grainger
When I was asked for something similar I quickly cobbled together a  
stylesheet (I'm no xsl expert so it's probably pretty bad).


Invoked like this:

http://localhost:8982/solr/select?q=testingfl=id,title_t,scorewt=xslttr=csv.xslrows=10

YMMV, but feel free to use it if it helps, I've attached it.

Brendan








On Jun 11, 2008, at 1:05 PM, Walter Underwood wrote:


I recommend using the OpenCSV package. Works fine, Apache 2.0 license.

http://opencsv.sourceforge.net/

wunder

On 6/11/08 10:00 AM, Otis Gospodnetic [EMAIL PROTECTED]  
wrote:



Hi Marshall,

I don't think there is a CSV Writer, but here are some pointers for  
writing

one:

$ ff \*Writer\*java | grep -v Test | grep request
./src/java/org/apache/solr/request/PHPResponseWriter.java
./src/java/org/apache/solr/request/XSLTResponseWriter.java
./src/java/org/apache/solr/request/JSONResponseWriter.java
./src/java/org/apache/solr/request/PythonResponseWriter.java
./src/java/org/apache/solr/request/RawResponseWriter.java
./src/java/org/apache/solr/request/QueryResponseWriter.java
./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java
./src/java/org/apache/solr/request/BinaryResponseWriter.java
./src/java/org/apache/solr/request/RubyResponseWriter.java
./src/java/org/apache/solr/request/TextResponseWriter.java
./src/java/org/apache/solr/request/XMLWriter.java
./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java
./src/java/org/apache/solr/request/XMLResponseWriter.java

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 

From: Marshall Weir [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 11, 2008 12:52:50 PM
Subject: CSV output

Hi,

Does SOLR have .csv output? I can find references to .csv input, but
not output.

Thank you,
Marshall








Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

Yes I did rebuild the index and they are the same document (just  
verified). The only thing that changed was the stemmer, but that makes  
no sense to me. Also, if the equation for the fieldNorm is:


fieldBoost * lengthNorm = fieldBoost * 1 /sqrt(numTermsForField)

Then that would mean numTermsForField would be: 5.22 when the norm is  
0.4375. Am I correct about how this is calculated?


Thanks again
Brendan

On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using the  
old
stemmer is: 0.5

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms. So on indexing using the new stemmer converter hanger  
assembly replacement gets expanded to: converter hanger assembly  
assemble replacement so there are 5 terms which gets a length norm of  
0.4472136 instead of 0.5. Still unsure how it gets 0.4375 though as  
the result for the field norm though unless I have a boost of 0.9783  
somewhere there.


Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Thanks so much, that explains it.

Brendan

On Jun 11, 2008, at 4:00 PM, Yonik Seeley wrote:


Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms.
So on indexing using the new stemmer converter hanger assembly  
replacement
gets expanded to: converter hanger assembly assemble replacement  
so there
are 5 terms which gets a length norm of 0.4472136 instead of 0.5.  
Still
unsure how it gets 0.4375 though as the result for the field norm  
though

unless I have a boost of 0.9783 somewhere there.

Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:


I've just changed the stemming algorithm slightly and am running  
a few

tests
against the old stemmer versus the new stemmer. I did a query for
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454

Re: Searching for empty fields

2008-05-06 Thread Brendan Grainger

Hi,

Not sure if this is what you want, but to search for 'empty' fields we  
use something like this:


(*:* AND -color:[* TO *])

Hope that helps.

Brendan

On May 6, 2008, at 6:43 PM, Daniel Andersson wrote:


Hi (again)

One of the fields in my database is color. It can either contain a  
value (blue, red etc) or be blank. When I perform a search with  
facet counts on, I get a count for _empty_.


How do I go about searching for this?

I've tried color: which gives me an error. Same with color:.  
color:_empty_ returns nothing at all.


Thanks in advance!

/ d




Re: Question about dismax query parsing

2008-03-14 Thread Brendan Grainger

Got it.

Thanks so much.

Brendan

On Mar 14, 2008, at 8:11 AM, Erik Hatcher wrote:



On Mar 13, 2008, at 3:06 AM, Brendan Grainger wrote:
Just started using the Dismax handler and it looks very promising.  
However I'm a little confused about this query. Could somebody  
please explain why I'm getting a phrase query here?


You're not actually getting a PhraseQuery.  The  
DisjunctionMaxQuery#toString() puts a ~ in for the tie breaker.   
In general, a Query#toString is lossy/confusing - but better than  
nothing.


 public String toString(String field) {
   StringBuffer buffer = new StringBuffer();
   buffer.append(();
   for (int i = 0 ; i  disjuncts.size(); i++) {
 Query subquery = (Query) disjuncts.get(i);
 if (subquery instanceof BooleanQuery) {   // wrap sub-bools in  
parens

   buffer.append(();
   buffer.append(subquery.toString(field));
   buffer.append());
 }
 else buffer.append(subquery.toString(field));
 if (i != disjuncts.size()-1) buffer.append( | );
   }
   buffer.append());
   if (tieBreakerMultiplier != 0.0f) {
 buffer.append(~);
 buffer.append(tieBreakerMultiplier);
   }
   if (getBoost() != 1.0) {
 buffer.append(^);
 buffer.append(getBoost());
   }
   return buffer.toString();
 }



+(((title_t:mass) (title_t:air) (title_t:flow))~3) ()',
And is that extra () indicative of something? I have some stuff  
going on with synonyms and I'm wondering if the position of the  
tokens is off and causing this.


That extra clause is Solr's dismax code putting in an empty clause -  
I think that is the boosting query(?).


Erik


The relevant output from debugQuery is below:


 'rawquerystring'='mass air flow',
 'querystring'='mass air flow',
 'parsedquery'='+((DisjunctionMaxQuery((title_t:mass))  
DisjunctionMaxQuery((title_t:air))  
DisjunctionMaxQuery((title_t:flow)))~3) ()',
 'parsedquery_toString'='+(((title_t:mass) (title_t:air)  
(title_t:flow))~3) ()',


Thanks!






Re: does solr handle hierarchical facets?

2007-12-17 Thread Brendan Grainger
This approach works (I do a similar thing using solr), but you have  
to be careful as BooleanQuery.TooManyClauses exception can be thrown  
depending where you use the wild card. It should be fine in the case  
you described however. Anyway, there is a pretty interesting  
discussion about this here:


http://www.usit.uio.no/it/vortex/arbeidsomrader/metadata/lucene/ 
limitations.html


Brendan


On Dec 17, 2007, at 10:39 PM, George Everitt wrote:




On Dec 13, 2007, at 1:56 AM, Chris Hostetter wrote:


ie, if this is your hierarchy...

   Products/
   Products/Computers/
   Products/Computers/Laptops
   Products/Computers/Desktops
   Products/Cases
   Products/Cases/Laptops
   Products/Cases/CellPhones

Then this trick won't work (because Laptops appears twice) but if  
you have
numeric IDs that corrispond with each of those categories (so that  
the two

instances of Laptops are unique...

   1/
   1/2/
   1/2/3
   1/2/4
   1/5/
   1/5/6
   1/5/7


Why not just use the whole path as the unique identifying token for  
a given node on the hierarchy?   That way, you don't need to map  
nodes to unique numbers, just use a prefix query.


taxonomy:Products/Computers/Laptops* or taxonomy:Products/Cases/ 
Laptops*


Sorry - that may be bogus query syntax, but you get the idea.

Products/Computers/Laptops* and Products/Cases/Laptops* are two  
unique identifiers.  You just need to make sure they are tokenized  
properly - which is beyond my current off-the-cuff expertise.


At least that is the way I've been doing it with IDOL lately.  I  
dearly hope I can do the same in Solr when the time comes.


I have a whole mess of Java code which parses out arbitrary path  
separated values into real tree structures.  I think it would be a  
useful addition to Solr, or maybe Solrj.  It's been knocking around  
my hard drives for the better part of a decade.   If I get enough  
interest, I'll clean it up and figure out how to offer it up as a  
part of the code base.  I'm pretty naive when it comes to FLOSS, so  
any authoritative non-condescending hints on how to go about this  
would be greatly appreciated.


Regards,
George




Re: does solr handle hierarchical facets?

2007-12-10 Thread Brendan Grainger

Hi,

I'm not sure if this is a good way to do it or not (so comments are  
more than welcome!), but the way we have achieved this is using the  
idea that a category/subcategory/subsubcategory etc create a path  
that we associate with a document. This is the simple field  
definition we use:


fieldType name=category_facet class=solr.TextField  
sortMissingLast=true omitNorms=false

  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
  /analyzer
/fieldType

Then to search for all docs in say Electronics/Computers/Apple

We search submit a filter query something like:

fq=category_facet:Electronics/Computers/Apple

To get all computers we use:

fq=category_facet:Electronics/Computers*

Seems to work. I'd love to know what other people do?

Brendan


On Dec 10, 2007, at 8:54 AM, Sean Laval wrote:



eg. category/subcategory/subsubcategory?

such that if you search for category, you get all those documents  
that have been tagged with the category AND any sub categories. If  
this is possible I think I'll investigate using solr in place of  
some existing code we have that deals with indexing and searching  
of such data.


Regards,

Sean
_
Get free emoticon packs and customisation from Windows Live.
http://www.pimpmylive.co.uk




Re: Issue with 2WD and 4WD in query

2007-12-10 Thread Brendan Grainger

Hi Matt,

Thanks for the reply. I've done what you said and I get exactly what  
you're saying as a result. Any ideas about how to make 2WD and 4WD be  
terms on their own?


THanks

On Dec 10, 2007, at 11:41 AM, Matt Kangas wrote:

Brendan, pull up your Solr Admin Analysis page and try running  
your queries through that. The output will tell you precisely how  
each analyzer affects your tokens on either the index or query side.


In my own quick test, WordDelimiterFilterFactory seems inclined to  
break 2WD into (2,WD)


(using org.apache.solr.analysis.WordDelimiterFilterFactory  
{catenateWords=1, catenateNumbers=1, catenateAll=0,  
generateNumberParts=1, generateWordParts=1})


--matt

On Dec 9, 2007, at 6:41 PM, Brendan Grainger wrote:


Hi,

I hope you can help me. I'm having an odd problem with solr. I  
have a field that could be represent a car. A car could have a  
name like Silverado or could be something like Silverado 2WD  
to denote the 2 wheel drive version of the car. Anyway, all is  
well when I search over the field for Silverado, but when I try  
searching for 2WD (doesn't matter what case) nothing is  
returned. Same applies for Silverado 2WD etc. I currently have  
the field defined as text, ie:


field name=car_name type=text indexed=true stored=true /

But I've also tried defining my own (simpler) field with no luck.  
FYI my text field is defined like this:


   fieldType name=text class=solr.TextField  
positionIncrementGap=100

 analyzer type=index
!-- This is supposed to remove HTML tags before indexing --
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
!--
   tokenizer class=solr.WhitespaceTokenizerFactory/
--
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType

Any help?

Thanks!
Brendan


--
Matt Kangas / [EMAIL PROTECTED]






Issue with 2WD and 4WD in query

2007-12-09 Thread Brendan Grainger

Hi,

I hope you can help me. I'm having an odd problem with solr. I have a  
field that could be represent a car. A car could have a name like  
Silverado or could be something like Silverado 2WD to denote the  
2 wheel drive version of the car. Anyway, all is well when I search  
over the field for Silverado, but when I try searching for  
2WD (doesn't matter what case) nothing is returned. Same applies  
for Silverado 2WD etc. I currently have the field defined as text, ie:


field name=car_name type=text indexed=true stored=true /

But I've also tried defining my own (simpler) field with no luck. FYI  
my text field is defined like this:


fieldType name=text class=solr.TextField  
positionIncrementGap=100

  analyzer type=index
!-- This is supposed to remove HTML tags before indexing --
tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/
!--
tokenizer class=solr.WhitespaceTokenizerFactory/
 --
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=1  
catenateNumbers=1 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory  
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true  
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory  
generateWordParts=1 generateNumberParts=1 catenateWords=0  
catenateNumbers=0 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory  
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

Any help?

Thanks!
Brendan

Re: Any tips for indexing large amounts of data?

2007-11-21 Thread Brendan Grainger

HI Otis,

Thanks for the reply. I am using a pretty vanilla approach right  
now and it's taking about 30 hours to build an index of about 5.5Gb.  
Can you please tell me what some of the changes you made to optimize  
the indexing process?


Thanks
Brendan

On Nov 21, 2007, at 2:27 AM, Otis Gospodnetic wrote:

Just tried a search for web on this index - 1.1 seconds.  This  
matches about 1MM of about 20MM docs.  Redo the search, and it's 1  
ms (cached).  This is without any load nor serious benchmarking,  
clearly.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Eswar K [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, November 21, 2007 2:11:07 AM
Subject: Re: Any tips for indexing large amounts of data?

Hi otis,

I understand that is slightly off track question, but I am just  
curious

 to
know the performance of Search on a 20 GB index file. What has been
 your
observation?

Regards,
Eswar

On Nov 21, 2007 12:33 PM, Otis Gospodnetic  
[EMAIL PROTECTED]

wrote:


Mike is right about the occasional slow-down, which appears as a

 pause and

is due to large Lucene index segment merging.  This should go away

 with

newer versions of Lucene where this is happening in the background.

That said, we just indexed about 20MM documents on a single 8-core

 machine

with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process

 took a

little less than 10 hours - that's over 550 docs/second.  The vanilla
approach before some of our changes apparently required several days

 to

index the same amount of data.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Mike Klaas [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 19, 2007 5:50:19 PM
Subject: Re: Any tips for indexing large amounts of data?

There should be some slowdown in larger indices as occasionally large
segment merge operations must occur.  However, this shouldn't really
affect overall speed too much.

You haven't really given us enough data to tell you anything useful.
I would recommend trying to do the indexing via a webapp to eliminate
all your code as a possible factor.  Then, look for signs to what is
happening when indexing slows.  For instance, is Solr high in cpu, is
the computer thrashing, etc?

-Mike

On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:


Hi,

Thanks for answering this question a while back. I have made some
of the suggestions you mentioned. ie not committing until I've
finished indexing. What I am seeing though, is as the index get
larger (around 1Gb), indexing is taking a lot longer. In fact it
slows down to a crawl. Have you got any pointers as to what I might
be doing wrong?

Also, I was looking at using MultiCore solr. Could this help in
some way?

Thank you
Brendan

On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:



: I would think you would see better performance by allowing auto
commit
: to handle the commit size instead of reopening the connection
all the
: time.

if your goal is fast indexing, don't use autoCommit at all ...

 just

index everything, and don't commit until you are completely done.

autoCommitting will slow your indexing down (the benefit being
that more
results will be visible to searchers as you proceed)




-Hoss

















Re: Any tips for indexing large amounts of data?

2007-11-21 Thread Brendan Grainger

Hi Otis,

Thanks for this. Are you using a flavor of linux and is it 64bit? How  
much heap are you giving your jvm?


Thanks again
Brendan

On Nov 21, 2007, at 2:03 AM, Otis Gospodnetic wrote:

Mike is right about the occasional slow-down, which appears as a  
pause and is due to large Lucene index segment merging.  This  
should go away with newer versions of Lucene where this is  
happening in the background.


That said, we just indexed about 20MM documents on a single 8-core  
machine with 8 GB of RAM, resulting in nearly 20 GB index.  The  
whole process took a little less than 10 hours - that's over 550  
docs/second.  The vanilla approach before some of our changes  
apparently required several days to index the same amount of data.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Mike Klaas [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 19, 2007 5:50:19 PM
Subject: Re: Any tips for indexing large amounts of data?

There should be some slowdown in larger indices as occasionally large
segment merge operations must occur.  However, this shouldn't really
affect overall speed too much.

You haven't really given us enough data to tell you anything useful.
I would recommend trying to do the indexing via a webapp to eliminate
all your code as a possible factor.  Then, look for signs to what is
happening when indexing slows.  For instance, is Solr high in cpu, is
the computer thrashing, etc?

-Mike

On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:


Hi,

Thanks for answering this question a while back. I have made some
of the suggestions you mentioned. ie not committing until I've
finished indexing. What I am seeing though, is as the index get
larger (around 1Gb), indexing is taking a lot longer. In fact it
slows down to a crawl. Have you got any pointers as to what I might
be doing wrong?

Also, I was looking at using MultiCore solr. Could this help in
some way?

Thank you
Brendan

On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:



: I would think you would see better performance by allowing auto
commit
: to handle the commit size instead of reopening the connection
all the
: time.

if your goal is fast indexing, don't use autoCommit at all ...

 just

index everything, and don't commit until you are completely done.

autoCommitting will slow your indexing down (the benefit being
that more
results will be visible to searchers as you proceed)




-Hoss












Re: Any tips for indexing large amounts of data?

2007-11-19 Thread Brendan Grainger

Hi,

Thanks for answering this question a while back. I have made some of  
the suggestions you mentioned. ie not committing until I've finished  
indexing. What I am seeing though, is as the index get larger (around  
1Gb), indexing is taking a lot longer. In fact it slows down to a  
crawl. Have you got any pointers as to what I might be doing wrong?


Also, I was looking at using MultiCore solr. Could this help in some  
way?


Thank you
Brendan

On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:



: I would think you would see better performance by allowing auto  
commit
: to handle the commit size instead of reopening the connection all  
the

: time.

if your goal is fast indexing, don't use autoCommit at all ... just
index everything, and don't commit until you are completely done.

autoCommitting will slow your indexing down (the benefit being that  
more

results will be visible to searchers as you proceed)




-Hoss





Re: Any tips for indexing large amounts of data?

2007-11-02 Thread Brendan Grainger
Thanks so much for your suggestions. I am attempting to index 550K  
docs at once, but have found I've had to break them up into smaller  
batches. Indexing seems to stop at around 47K docs (the index reaches  
264M in size at this point). The index eventually itself grows to  
about 2Gb. I am using embedded solr and adding a document with code  
very similar to this:




private void addModel(Model model) throws IOException {
UpdateHandler updateHandler = solrCore.getUpdateHandler();
AddUpdateCommand addcmd = new AddUpdateCommand();

DocumentBuilder builder = new DocumentBuilder 
(solrCore.getSchema());

builder.startDoc();
builder.addField(id, Model: + model.getUuid());
builder.addField(class, Model);
builder.addField(uuid, model.getUuid());
builder.addField(one_facet, model.getOneFacet());
builder.addField(another_facet, model.getAnotherFacet());

  .. other fields

addcmd.doc = builder.getDoc();
addcmd.allowDups = false;
addcmd.overwritePending = true;
addcmd.overwriteCommitted = true;
updateHandler.addDoc(addcmd);
}

I have other 'Model' objects I'm adding also.

Thanks

On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:



: I would think you would see better performance by allowing auto  
commit
: to handle the commit size instead of reopening the connection all  
the

: time.

if your goal is fast indexing, don't use autoCommit at all ... just
index everything, and don't commit until you are completely done.

autoCommitting will slow your indexing down (the benefit being that  
more

results will be visible to searchers as you proceed)




-Hoss





Any tips for indexing large amounts of data?

2007-10-31 Thread Brendan Grainger

Hi,

I am creating an index of approx 500K documents. I wrote an indexing  
program using embeded solr: http://wiki.apache.org/solr/EmbeddedSolr  
and am seeing probably a 10 fold increase in indexing speeds. My  
problem is though, that if I try to reindex say 20K docs at a time it  
slows down considerably. I currently batch my updates in lots of 100  
and between batches I close and reopen the connection to solr like so:


private void openConnection(String environment) throws  
ParserConfigurationException, IOException, SAXException {

System.setProperty(solr.solr.home, SOLR_HOME);
solrConfig = new SolrConfig(solrconfig.xml);
solrCore = new SolrCore(SOLR_HOME + data/ + environment,  
solrConfig, new IndexSchema(solrConfig, schema.xml));

logger.debug(Opened solr connection);
}

private void closeConnection() {
solrCore.close();
solrCore = null;
logger.debug(Closed solr connection);
}

Does anyone have any pointers or see anything obvious I'm doing wrong?

Thanks


PS Sorry if this is posted twice.