Unique id

2008-11-19 Thread Raghunandan Rao
Hi,

Is the uniqueKey in schema.xml really required?

 

Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it better
or can I do away with this? 

 

Thanks,

Rahgu



RE: Unique id

2008-11-19 Thread Raghunandan Rao
Ok got it. 
I am indexing two tables differently. I am using Solrj to index with
@Field annotation. I make two queries initially and fetch the data from
two tables and index them separately. But what if the ids in two tables
are same? That means documents with same id will be deleted when doing
update. 

How does this work? Please explain. 

Thanks. 

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Yes it is. You need a unique id because the add method works as and add

or update method. When adding a document whose ID is already found in
the  
index, the old document will be deleted and the new will be added. Are
you  
indexing two tables into the same index? Or does one entry in the index

consist of data from both tables? How are these linked together without
an  
ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:

 Hi,

 Is the uniqueKey in schema.xml really required?


 Reason is, I am indexing two tables and I have id as unique key in
 schema.xml but id field is not there in one of the tables and indexing
 fails. Do I really require this unique field for Solr to index it
better
 or can I do away with this?


 Thanks,

 Rahgu




-- 
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Error in indexing timestamp format.

2008-11-19 Thread con


Hi Nobble,
Thank you very much
That removed the error while server startup.

But I don't think the data is getting indexed upon running the dataimport. I
am unable to display the date field values on searching.
This is my complete configs:

entity name=employees
transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID
query=select EMP_ID, CREATED_DATE, CUST_ID FROM EMP, CUST where EMP.EMP_ID
= CUST.EMP_ID  
   field column=rowtype template=employees / 
   field column=EMP_ID name=EMP_ID / 
   field column=CUST_ID name=CUST_ID / 
   field column=CREATED_DATE sourceColName=CREATED_DATE
dateTimeFormat=dd-MM-yy HH:mm:ss.S a / 
/entity 

In the schema.xml I have:   
field name=CREATED_DATE type=date indexed=true stored=true / 

   copyField source=CREATED_DATE dest=CREATED_DATE/


Do I need some other configurations.

Thanks in advance
con






Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 sorry I meant wrong dest field name
 
 On Wed, Nov 19, 2008 at 12:41 PM, con [EMAIL PROTECTED] wrote:

 Hi Nobble

 I have cross checked. This is my copy field of schema.xml

   copyField source=CREATED_DATE dest=date /

 I am still getting that error.

 thanks
 con



 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 yoour copyField has the wrong source field name . Field name is not
 date it is 'CREATED_DATE'

 On Wed, Nov 19, 2008 at 11:49 AM, con [EMAIL PROTECTED] wrote:

 Hi Shalin
 Please find the log data.

 10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: No /solr/home in JNDI
 10:18:30,839 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: using system property solr.solr.home: C:\Search\solr
 10:18:30,844 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.CoreContainer$Initializer initialize
 INFO: looking for solr.xml: C:\Search\solr\solr.xml
 10:18:30,845 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'C:\Search\solr/'
 10:18:30,846 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr
 classloader
 10:18:30,847 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr
 classloader
 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader
 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to Solr
 classloader
 10:18:30,849 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to
 Solr
 classloader
 10:18:30,864 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.CoreContainer load
 INFO: loading shared library: C:\Search\solr\lib
 10:18:30,867 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr
 classloader
 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr
 classloader
 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader
 10:18:30,871 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to Solr
 classloader
 10:18:30,872 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to
 Solr
 classloader
 10:18:30,896 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'C:\Search\solr\feedback/'
 10:18:30,896 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Reusing parent classloader
 10:18:31,328 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM
 org.apache.solr.core.SolrConfig init
 INFO: Loaded SolrConfig: solrconfig.xml
 10:18:31,370 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: Reading Solr Schema
 10:18:31,381 ERROR [STDERR] 19 Nov, 2008 10:18:31 AM
 org.apache.solr.schema.IndexSchema readSchema
 INFO: Schema name=feedback schema
 

Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Ok, but how do you map your table structure to the index? As far as I can  
understand, the two tables have different structre, so why/how do you map  
two different datastructures onto a single index? Are the two tables  
connected in some way? If so, you could make your index structure reflect  
the union of both tables and just make one insertion into the index per  
entry of the two tables.


Maybe you could post the table structure so that I can get a better  
understanding of your use-case...


- Aleks

On Wed, 19 Nov 2008 11:25:56 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Ok got it.
I am indexing two tables differently. I am using Solrj to index with
@Field annotation. I make two queries initially and fetch the data from
two tables and index them separately. But what if the ids in two tables
are same? That means documents with same id will be deleted when doing
update.

How does this work? Please explain.

Thanks.

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Yes it is. You need a unique id because the add method works as and add

or update method. When adding a document whose ID is already found in
the
index, the old document will be deleted and the new will be added. Are
you
indexing two tables into the same index? Or does one entry in the index

consist of data from both tables? How are these linked together without
an
ID?

- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao
[EMAIL PROTECTED] wrote:


Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it

better

or can I do away with this?


Thanks,

Rahgu






Re: Use SOLR like the MySQL LIKE

2008-11-19 Thread Norberto Meijome
On Tue, 18 Nov 2008 14:26:02 +0100
Aleksander M. Stensby [EMAIL PROTECTED] wrote:

 Well, then I suggest you index the field in two different ways if you want  
 both possible ways of searching. One, where you treat the entire name as  
 one token (in lowercase) (then you can search for avera* and match on for  
 instance average joe etc.) And then another field where you tokenize on  
 whitespace for instance, if you want/need that possibility aswell. Look at  
 the solr copy fields and try it out, it works like a charm :)

You should also make extensive use of  analysis.jsp  to see how data in your
field (1) is tokenized, filtered and indexed, and how your search terms are
tokenized, filtered and matched against (1). 
Hint 1 : check all the checkboxes ;)
Hint 2: you don't need to reindex all your data, just enter test data in the
form and give it a go. You will of course have to tweak schema.xml and restart
your service when you do this.

good luck,
B
_
{Beto|Norberto|Numard} Meijome

Intellectual: 'Someone who has been educated beyond his/her intelligence'
   Arthur C. Clarke, from 3001, The Final Odyssey, Sources.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Unique id

2008-11-19 Thread Aleksander M. Stensby
Yes it is. You need a unique id because the add method works as and add  
or update method. When adding a document whose ID is already found in the  
index, the old document will be deleted and the new will be added. Are you  
indexing two tables into the same index? Or does one entry in the index  
consist of data from both tables? How are these linked together without an  
ID?


- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao  
[EMAIL PROTECTED] wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and indexing
fails. Do I really require this unique field for Solr to index it better
or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Upgrade from 1.2 to 1.3 gives 3x slowdown

2008-11-19 Thread Fergus McMenemie
Hello,

I have a CSV file with 6M records which took 22min to index with 
solr 1.2. I then stopped tomcat replaced the solr stuff inside 
webapps with version 1.3, wiped my index and restarted tomcat.

Indexing the exact same content now takes 69min. My machine has
2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M -Xms512M.

Are there any tweaks I can use to get the original index time
back. I read through the release notes and was expecting a
speed up. I saw the bit about increasing ramBufferSizeMB and set
it to 64MB; it had no effect.
-- 

===
Fergus McMenemie   Email:[EMAIL PROTECTED]
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: Unique id

2008-11-19 Thread Erik Hatcher
Technically, no, a uniqueKey field is NOT required.  I've yet to run  
into a situation where it made sense not to use one though.


As for indexing database tables - if one of your tables doesn't have a  
primary key, does it have an aggregate unique key of some sort?  Do  
you plan on updating the rows in that table and reindexing them?   
Seems like some kind of unique key would make sense for updating  
documents.


But yeah, a more detailed description of your table structure and  
searching needs would be helpful.


Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:

Yes it is. You need a unique id because the add method works as and  
add or update method. When adding a document whose ID is already  
found in the index, the old document will be deleted and the new  
will be added. Are you indexing two tables into the same index? Or  
does one entry in the index consist of data from both tables? How  
are these linked together without an ID?


- Aleksander

On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao [EMAIL PROTECTED] 
 wrote:



Hi,

Is the uniqueKey in schema.xml really required?


Reason is, I am indexing two tables and I have id as unique key in
schema.xml but id field is not there in one of the tables and  
indexing
fails. Do I really require this unique field for Solr to index it  
better

or can I do away with this?


Thanks,

Rahgu





--
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no




DataImportHandler: Javascript transformer for splitting field-values

2008-11-19 Thread Steffen
Hi everyone,
I'm currently working with the nightly build of Solr (solr-2008-11-17)
and trying to figure out how to transform a row-object with Javascript
to include multiple values (in a single multivalued field). When I try
something like this as a transformer:
function splitTerms(row) {
//each term should be duplicated into count field-values
//dummy-code to show the idea
row.put('terms',['term','term','term']);
return row;
}
[...]
entity name=searchfeedback pk=id transformer=script:splitTerms
query=SELECT term,count FROM termtable WHERE id=${parent.id} /

The DataImportHandler debugger returns:
arr
  str
sun.org.mozilla.javascript.internal.NativeArray:[EMAIL PROTECTED]
  /str
/arr
What it *should* return:
arr
  strterm/str
  strterm/str
  strterm/str
/arr

So, what am I doing wrong? My transformer will be invoked multiple
times from a MySQL-Query and in turn has to insert multiple values to
the same field during each invocation. It should do something similar
to the RegexTransformer (field splitBy)... is that possible? Right now
I have to use a workaround that includes the term-duplication on the
database sides, which is kinda ugly if a term has to be duplicated a
lot.
Greetings,
Steffen


Question about autocommit

2008-11-19 Thread Nickolai Toupikov

Hello,
I would like some details on the autocommit mechanism. I tried to search 
the wiki, but found only the

standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and 
30milis.
Indexing at around  200 docs per second (from multiple processes, using 
the CommonsHttpSolrServer class),

i would have expected autocommits to occur around every  40 seconds,
however the jvm log shows the following -  sometimes more than two calls 
per second:


$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:16.788] {pool-2-thread-1} end_commit_flush
[16:18:21.721] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:22.073] {pool-2-thread-1} end_commit_flush
[16:18:36.047] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:36.468] {pool-2-thread-1} end_commit_flush
[16:18:36.886] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:37.017] {pool-2-thread-1} end_commit_flush
[16:18:37.867] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:38.448] {pool-2-thread-1} end_commit_flush
[16:18:44.375] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.016] {pool-2-thread-1} end_commit_flush
[16:18:47.154] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.287] {pool-2-thread-1} end_commit_flush
[16:18:50.399] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:51.283] {pool-2-thread-1} end_commit_flush
[16:19:13.782] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:14.664] {pool-2-thread-1} end_commit_flush
[16:19:15.081] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.215] {pool-2-thread-1} end_commit_flush
[16:19:15.357] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.955] {pool-2-thread-1} end_commit_flush
[16:19:16.421] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:19.791] {pool-2-thread-1} end_commit_flush
[16:19:50.594] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.098] {pool-2-thread-1} end_commit_flush
[16:19:52.236] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.368] {pool-2-thread-1} end_commit_flush
[16:19:52.917] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:53.479] {pool-2-thread-1} end_commit_flush
[16:19:54.920] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as 
many autocommits as commits -

so i assume it is not some commit(); line lost in my code.

I actually get the feeling that the commits are triggered more and more 
often - with not-so-nice
influence on indexing speed over time. Restarting resin seems to get the 
commit rate to the original level.

Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai


Re: Question about autocommit

2008-11-19 Thread Mark Miller
Interesting...could go along with the earlier guys post about slow 
indexing...


Nickolai Toupikov wrote:

Hello,
I would like some details on the autocommit mechanism. I tried to 
search the wiki, but found only the

standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  
and 30milis.
Indexing at around  200 docs per second (from multiple processes, 
using the CommonsHttpSolrServer class),

i would have expected autocommits to occur around every  40 seconds,
however the jvm log shows the following -  sometimes more than two 
calls per second:


$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:16.788] {pool-2-thread-1} end_commit_flush
[16:18:21.721] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:22.073] {pool-2-thread-1} end_commit_flush
[16:18:36.047] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:36.468] {pool-2-thread-1} end_commit_flush
[16:18:36.886] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:37.017] {pool-2-thread-1} end_commit_flush
[16:18:37.867] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:38.448] {pool-2-thread-1} end_commit_flush
[16:18:44.375] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.016] {pool-2-thread-1} end_commit_flush
[16:18:47.154] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.287] {pool-2-thread-1} end_commit_flush
[16:18:50.399] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:51.283] {pool-2-thread-1} end_commit_flush
[16:19:13.782] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:14.664] {pool-2-thread-1} end_commit_flush
[16:19:15.081] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.215] {pool-2-thread-1} end_commit_flush
[16:19:15.357] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.955] {pool-2-thread-1} end_commit_flush
[16:19:16.421] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:19.791] {pool-2-thread-1} end_commit_flush
[16:19:50.594] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.098] {pool-2-thread-1} end_commit_flush
[16:19:52.236] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.368] {pool-2-thread-1} end_commit_flush
[16:19:52.917] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:53.479] {pool-2-thread-1} end_commit_flush
[16:19:54.920] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as 
many autocommits as commits -

so i assume it is not some commit(); line lost in my code.

I actually get the feeling that the commits are triggered more and 
more often - with not-so-nice
influence on indexing speed over time. Restarting resin seems to get 
the commit rate to the original level.

Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai




Re: Question about autocommit

2008-11-19 Thread Mark Miller
Could also go with the thread safety issues with pending and the 
deadlock that was reported the other day. All could pretty easily be 
related. Do we have a JIRA issue on it yet? Suppose I'll look...


Mark Miller wrote:
Interesting...could go along with the earlier guys post about slow 
indexing...


Nickolai Toupikov wrote:

Hello,
I would like some details on the autocommit mechanism. I tried to 
search the wiki, but found only the

standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  
and 30milis.
Indexing at around  200 docs per second (from multiple processes, 
using the CommonsHttpSolrServer class),

i would have expected autocommits to occur around every  40 seconds,
however the jvm log shows the following -  sometimes more than two 
calls per second:


$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:16.788] {pool-2-thread-1} end_commit_flush
[16:18:21.721] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:22.073] {pool-2-thread-1} end_commit_flush
[16:18:36.047] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:36.468] {pool-2-thread-1} end_commit_flush
[16:18:36.886] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:37.017] {pool-2-thread-1} end_commit_flush
[16:18:37.867] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:38.448] {pool-2-thread-1} end_commit_flush
[16:18:44.375] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.016] {pool-2-thread-1} end_commit_flush
[16:18:47.154] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:47.287] {pool-2-thread-1} end_commit_flush
[16:18:50.399] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:18:51.283] {pool-2-thread-1} end_commit_flush
[16:19:13.782] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:14.664] {pool-2-thread-1} end_commit_flush
[16:19:15.081] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.215] {pool-2-thread-1} end_commit_flush
[16:19:15.357] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:15.955] {pool-2-thread-1} end_commit_flush
[16:19:16.421] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:19.791] {pool-2-thread-1} end_commit_flush
[16:19:50.594] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.098] {pool-2-thread-1} end_commit_flush
[16:19:52.236] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:52.368] {pool-2-thread-1} end_commit_flush
[16:19:52.917] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:53.479] {pool-2-thread-1} end_commit_flush
[16:19:54.920] {pool-2-thread-1} start 
commit(optimize=false,waitFlush=true,waitSearcher=true)

[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as 
many autocommits as commits -

so i assume it is not some commit(); line lost in my code.

I actually get the feeling that the commits are triggered more and 
more often - with not-so-nice
influence on indexing speed over time. Restarting resin seems to get 
the commit rate to the original level.

Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai






RE: Question about autocommit

2008-11-19 Thread Nguyen, Joe
Could ramBufferSizeMB trigger the commit in this case?  

-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 8:36 Joe
To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to search
the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and
30milis.
Indexing at around  200 docs per second (from multiple processes, using
the CommonsHttpSolrServer class), i would have expected autocommits to
occur around every  40 seconds, however the jvm log shows the following
-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as
many autocommits as commits - so i assume it is not some commit(); line
lost in my code.

I actually get the feeling that the commits are triggered more and more
often - with not-so-nice influence on indexing speed over time.
Restarting resin seems to get the commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai


Re: Question about autocommit

2008-11-19 Thread Mark Miller
They are separate commits. ramBufferSizeMB controls when the underlying 
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter 
commiting or closing). The solr autocommit controls when solr asks 
IndexWriter to commit what its done so far.


Nguyen, Joe wrote:
Could ramBufferSizeMB trigger the commit in this case?  


-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 8:36 Joe

To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to search
the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  and
30milis.
Indexing at around  200 docs per second (from multiple processes, using
the CommonsHttpSolrServer class), i would have expected autocommits to
occur around every  40 seconds, however the jvm log shows the following
-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920]
{pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as
many autocommits as commits - so i assume it is not some commit(); line
lost in my code.

I actually get the feeling that the commits are triggered more and more
often - with not-so-nice influence on indexing speed over time.
Restarting resin seems to get the commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank you very much.

Nickolai
  




RE: Question about autocommit

2008-11-19 Thread Nguyen, Joe
As far as I know, commit could be triggered by

Manually
1.  invoke commit() method
Automatically
2.  maxDoc
3.  maxTime

Since the document size is arbitrary and some document could be huge,
could commit also be triggered by memory buffered size?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 9:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

They are separate commits. ramBufferSizeMB controls when the underlying
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter
commiting or closing). The solr autocommit controls when solr asks
IndexWriter to commit what its done so far.

Nguyen, Joe wrote:
 Could ramBufferSizeMB trigger the commit in this case?  

 -Original Message-
 From: Nickolai Toupikov [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 19, 2008 8:36 Joe
 To: solr-user@lucene.apache.org
 Subject: Question about autocommit

 Hello,
 I would like some details on the autocommit mechanism. I tried to 
 search the wiki, but found only the standard maxDoc/time settings.
 i have set the autocommit parameters in solrconfig.xml to 8000 docs  
 and 30milis.
 Indexing at around  200 docs per second (from multiple processes, 
 using the CommonsHttpSolrServer class), i would have expected 
 autocommits to occur around every  40 seconds, however the jvm log 
 shows the following
 -  sometimes more than two calls per second:

 $ tail -f jvm-default.log | grep commit
 [16:18:15.862] {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
 {pool-2-thread-1} start
 commit(optimize=false,waitFlush=true,waitSearcher=true)
 [16:19:55.079] {pool-2-thread-1} end_commit_flush


 additionally, in the solr admin page , the update handler reports as 
 many autocommits as commits - so i assume it is not some commit(); 
 line lost in my code.

 I actually get the feeling that the commits are triggered more and 
 more often - with not-so-nice influence on indexing speed over time.
 Restarting resin seems to get the commit rate to the original level.
 Optimizing has no effect.
 Is there some other parameter influencing autocommit?

 Thank you very much.

 Nickolai
   



Re: Question about autocommit

2008-11-19 Thread Nickolai Toupikov
The documents have an average size of about a kilobyte i would say. 
bigger ones can pop up,
but not nearly often enough to trigger memory-commits every couple of 
seconds.
I dont have the exact figures, but i would expect the memory buffer 
limit to be far beyond the 8000 document  one in most of the cases.
actually i have first started indexing with a 2000 document limit - a 
commit expected every ten seconds or so.
in a couple of hours the speed of indexing choked down from over 200 to 
under 100 documents per second - and all the same i had
several autocommits a second. so i restarted with a limit  at 8000. with 
the results i mentionned in the previous email.


Nguyen, Joe wrote:

As far as I know, commit could be triggered by

Manually
1.  invoke commit() method
Automatically
2.  maxDoc
3.  maxTime

Since the document size is arbitrary and some document could be huge,
could commit also be triggered by memory buffered size?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 9:09 Joe

To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

They are separate commits. ramBufferSizeMB controls when the underlying
Lucene IndexWriter flushes ram to disk (this isnt like the IndexWriter
commiting or closing). The solr autocommit controls when solr asks
IndexWriter to commit what its done so far.

Nguyen, Joe wrote:
  
Could ramBufferSizeMB trigger the commit in this case?  


-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 8:36 Joe
To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to 
search the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs  
and 30milis.
Indexing at around  200 docs per second (from multiple processes, 
using the CommonsHttpSolrServer class), i would have expected 
autocommits to occur around every  40 seconds, however the jvm log 
shows the following

-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.368] {pool-2-thread-1} end_commit_flush [16:19:52.917] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:53.479] {pool-2-thread-1} end_commit_flush [16:19:54.920] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:55.079] {pool-2-thread-1} end_commit_flush


additionally, in the solr admin page , the update handler reports as 
many autocommits as commits - so i assume it is not some commit(); 
line lost in my code.


I actually get the feeling that the commits are triggered more and 
more often - with not-so-nice influence on indexing speed over time.

Restarting resin seems to get the commit rate to the original level.
Optimizing has no effect.
Is there some other parameter influencing autocommit?

Thank 

Re: Question about autocommit

2008-11-19 Thread Nickolai Toupikov
I dont know. After reading my last email, i realized i did not say 
explicitly that by 'restarting' i merely meant 'restarting resin' . I 
did not restart indexing from scratch. And - if I understand correctly - 
if the merge factor was the culprit, restarting the servlet container 
would have had no effect.


Nguyen, Joe wrote:

First it was fast, but after a couple of hours, it was slow down...
Could mergeFactor affect the indexing speed since solr would take time
to merge multiple segments into a single one?

http://wiki.apache.org/solr/SolrPerformanceFactors#head-224d9a793c7c57d8
662d5351f955ddf8c0a3ebcd
 


-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 9:51 Joe

To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

The documents have an average size of about a kilobyte i would say. 
bigger ones can pop up,

 but not nearly often enough to trigger memory-commits every couple of
seconds.
I dont have the exact figures, but i would expect the memory buffer
limit to be far beyond the 8000 document  one in most of the cases.
actually i have first started indexing with a 2000 document limit - a
commit expected every ten seconds or so.
in a couple of hours the speed of indexing choked down from over 200 to
under 100 documents per second - and all the same i had several
autocommits a second. so i restarted with a limit  at 8000. with the
results i mentionned in the previous email.

Nguyen, Joe wrote:
  

As far as I know, commit could be triggered by

Manually
1.  invoke commit() method
Automatically
2.  maxDoc
3.  maxTime

Since the document size is arbitrary and some document could be huge, 
could commit also be triggered by memory buffered size?


-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 9:09 Joe
To: solr-user@lucene.apache.org
Subject: Re: Question about autocommit

They are separate commits. ramBufferSizeMB controls when the 
underlying Lucene IndexWriter flushes ram to disk (this isnt like the 
IndexWriter commiting or closing). The solr autocommit controls when 
solr asks IndexWriter to commit what its done so far.


Nguyen, Joe wrote:
  

Could ramBufferSizeMB trigger the commit in this case?  


-Original Message-
From: Nickolai Toupikov [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 19, 2008 8:36 Joe
To: solr-user@lucene.apache.org
Subject: Question about autocommit

Hello,
I would like some details on the autocommit mechanism. I tried to 
search the wiki, but found only the standard maxDoc/time settings.
i have set the autocommit parameters in solrconfig.xml to 8000 docs 
and 30milis.
Indexing at around  200 docs per second (from multiple processes, 
using the CommonsHttpSolrServer class), i would have expected 
autocommits to occur around every  40 seconds, however the jvm log 
shows the following

-  sometimes more than two calls per second:

$ tail -f jvm-default.log | grep commit
[16:18:15.862] {pool-2-thread-1} start
commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:16.788] {pool-2-thread-1} end_commit_flush [16:18:21.721] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:22.073] {pool-2-thread-1} end_commit_flush [16:18:36.047] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:36.468] {pool-2-thread-1} end_commit_flush [16:18:36.886] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:37.017] {pool-2-thread-1} end_commit_flush [16:18:37.867] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:38.448] {pool-2-thread-1} end_commit_flush [16:18:44.375] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.016] {pool-2-thread-1} end_commit_flush [16:18:47.154] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:47.287] {pool-2-thread-1} end_commit_flush [16:18:50.399] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:18:51.283] {pool-2-thread-1} end_commit_flush [16:19:13.782] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:14.664] {pool-2-thread-1} end_commit_flush [16:19:15.081] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.215] {pool-2-thread-1} end_commit_flush [16:19:15.357] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:15.955] {pool-2-thread-1} end_commit_flush [16:19:16.421] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:19.791] {pool-2-thread-1} end_commit_flush [16:19:50.594] 
{pool-2-thread-1} start

commit(optimize=false,waitFlush=true,waitSearcher=true)
[16:19:52.098] {pool-2-thread-1} end_commit_flush [16:19:52.236] 
{pool-2-thread-1} start


Multi word Synonym

2008-11-19 Thread Jeff Newburn
I am trying to figure out how the synonym filter processes multi word
inputs.  I have checked the analyzer in the GUI with some confusing results.
The indexed field has ³The North Face² as a value. The synonym file has

morthface, morth face, noethface, noeth face, norhtface, norht face,
nortface, nort face, northfac, north fac, northfac3e, north fac3e,
northface, north face, northfae, north fae, northfaqce, north faqce,
northfave, north fave, northhace, north hace, nothface, noth face,
thenorhface, the norh face, thenorth, the north, thenorthandface, the north
and face, thenortheface, the northe face, thenorthfac, the north fac,
thenorthface, thenorthfacee, the north facee, thenothface, the noth face,
thenotrhface, the notrh face, thenrothface, the nroth face, tnf = The North
Face

I have the field type using the WhiteSpaceTokenizer before the synonyms are
running.  My confusion on this is when the term ³morth fac² is run somehow
the system knows to map it to the correct term even though the term is not
present in the file.

How is this happening?  Is the synonym process tokenzing as well?

The datatype schema is as follows:
   fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/

   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   /fieldType


-Jeff



No search result behavior (a la Amazon)

2008-11-19 Thread Caligula

It appears to me that Amazon is using a 100% minimum match policy.  If there
are no matches, they break down the original search terms and give
suggestion search results.

example:

http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keywords=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0


Can Solr natively achieve something similar?  If not, can you suggest a way
to achieve this?  A custom RequestHandler?


Thanks!
-- 
View this message in context: 
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20587024p20587024.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: No search result behavior (a la Amazon)

2008-11-19 Thread Nguyen, Joe
Have a look at DisMaxRequestHandler and play with mm (miminum terms
should match)

http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28CategorySo
lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5fe4
1d68f3910ed544311435393f5727408e61


-Original Message-
From: Caligula [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 11:11 Joe
To: solr-user@lucene.apache.org
Subject: No search result behavior (a la Amazon)


It appears to me that Amazon is using a 100% minimum match policy.  If
there are no matches, they break down the original search terms and give
suggestion search results.

example:

http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keywor
ds=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0


Can Solr natively achieve something similar?  If not, can you suggest a
way to achieve this?  A custom RequestHandler?


Thanks!
--
View this message in context:
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058
7024p20587024.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Glen Newton
Hello,

I am looking for the Solr schema equivalent to Lucene's StandardAnalyser.

Is it the Solr schema type:
 fieldType name=text class=solr.TextField

Is there some way of directly invoking Lucene's StandardAnalyser?

Thanks,
Glen
-- 

-


RE: No search result behavior (a la Amazon)

2008-11-19 Thread Caligula

I understand how to do the 100% mm part.  It's the behavior when there are
no matches that i'm asking about :)



Nguyen, Joe-2 wrote:
 
 Have a look at DisMaxRequestHandler and play with mm (miminum terms
 should match)
 
 http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28CategorySo
 lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5fe4
 1d68f3910ed544311435393f5727408e61
 
 
 -Original Message-
 From: Caligula [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, November 19, 2008 11:11 Joe
 To: solr-user@lucene.apache.org
 Subject: No search result behavior (a la Amazon)
 
 
 It appears to me that Amazon is using a 100% minimum match policy.  If
 there are no matches, they break down the original search terms and give
 suggestion search results.
 
 example:
 
 http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keywor
 ds=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0
 
 
 Can Solr natively achieve something similar?  If not, can you suggest a
 way to achieve this?  A custom RequestHandler?
 
 
 Thanks!
 --
 View this message in context:
 http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058
 7024p20587024.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20587024p20587896.html
Sent from the Solr - User mailing list archive at Nabble.com.



filtering on blank OR specific range

2008-11-19 Thread Geoffrey Young
hi all :)

I'm having difficultly filtering my documents when a field is either
blank or set to a specific value.  I would have thought this would work

  fq=-Type:[* TO *] OR Type:blue

which I would expect to find all document where either Type is undefined
or Type is blue.  my actual result set is zero.

using a similar filter

  fq=-Type:[* TO *] OR OtherThing:cat

does what I would expect (documents with undefined type or documents
with cats), so it feels like solr is getting confused with the range
negation and ORing, but only when the field is the same.  adding various
parentheses makes no difference.

I know this is kind of nebulous sounding, but I was hoping someone would
look at this and go you're doing it wrong.  your filter should be...

the field is defined as

  field name=Type type=string indexed=true stored=true
multiValued=true/

if it matters.

tia

--Geoff


RE: filtering on blank OR specific range

2008-11-19 Thread Lance Norskog
Try:   Type:blue OR -Type:[* TO *] 

You can't have a negative clause at the beginning. Yes, Lucene should barf
about this.

-Original Message-
From: Geoffrey Young [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 12:17 PM
To: solr-user@lucene.apache.org
Subject: filtering on blank OR specific range

hi all :)

I'm having difficultly filtering my documents when a field is either blank
or set to a specific value.  I would have thought this would work

  fq=-Type:[* TO *] OR Type:blue

which I would expect to find all document where either Type is undefined or
Type is blue.  my actual result set is zero.

using a similar filter

  fq=-Type:[* TO *] OR OtherThing:cat

does what I would expect (documents with undefined type or documents with
cats), so it feels like solr is getting confused with the range negation and
ORing, but only when the field is the same.  adding various parentheses
makes no difference.

I know this is kind of nebulous sounding, but I was hoping someone would
look at this and go you're doing it wrong.  your filter should be...

the field is defined as

  field name=Type type=string indexed=true stored=true
multiValued=true/

if it matters.

tia

--Geoff



Logging in Solr.

2008-11-19 Thread Erik Holstad
I kind if remember hearing that Solr was using SLF4J for the logging, but
I haven't been able to find any information about it. And in that case where
do you set it to redirect to you log4j server for example?

Regards Erik


Re: filtering on blank OR specific range

2008-11-19 Thread Geoffrey Young


Lance Norskog wrote:
 Try:   Type:blue OR -Type:[* TO *] 
 
 You can't have a negative clause at the beginning. Yes, Lucene should barf
 about this.

I did try that, before and again now, and still no luck.

anything else?

--Geoff


Re: Logging in Solr.

2008-11-19 Thread Ryan McKinley

the trunk (solr-1.4-dev) is now using SLF4J

If you are using the packaged .war, the behavior should be identical  
to 1.3 -- that is, it uses the java.util.logging implementation.


However, if you are using solr.jar, you select what logging framework  
you actully want to use by including that connector in your  
classpath.  For example, to use log4j you add slf4j-jdk14-1.5.5.jar to  
your classpath, then everything will behave as though it were  
configured using log4j.


See: http://www.slf4j.org/ for more info

ryan



On Nov 19, 2008, at 4:21 PM, Erik Holstad wrote:

I kind if remember hearing that Solr was using SLF4J for the  
logging, but
I haven't been able to find any information about it. And in that  
case where

do you set it to redirect to you log4j server for example?

Regards Erik




RE: No search result behavior (a la Amazon)

2008-11-19 Thread Nguyen, Joe

Seemed like its first search required match all terms.
If it could not find it, like you motioned, you broke down into multiple
smaller term set and ran search to get total hit for each smaller term
set, sort the results by total hits, and display summary page.

Searching for A B C would be
1. q= +A +B +C Match all terms
2. q= +A +B -C Match A and B but not C
3. q =+A -B +C
4. q = 

 

-Original Message-
From: Caligula [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 11:52 Joe
To: solr-user@lucene.apache.org
Subject: RE: No search result behavior (a la Amazon)


I understand how to do the 100% mm part.  It's the behavior when there
are no matches that i'm asking about :)



Nguyen, Joe-2 wrote:
 
 Have a look at DisMaxRequestHandler and play with mm (miminum terms 
 should match)
 
 http://wiki.apache.org/solr/DisMaxRequestHandler?highlight=%28Category
 So
 lrRequestHandler%29%7C%28%28CategorySolrRequestHandler%29%29#head-6c5f
 e4
 1d68f3910ed544311435393f5727408e61
 
 
 -Original Message-
 From: Caligula [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, November 19, 2008 11:11 Joe
 To: solr-user@lucene.apache.org
 Subject: No search result behavior (a la Amazon)
 
 
 It appears to me that Amazon is using a 100% minimum match policy.  If

 there are no matches, they break down the original search terms and 
 give suggestion search results.
 
 example:
 
 http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Dapsfield-keyw
 or ds=ipod+nano+4th+generation+8gb+blue+calciumx=0y=0
 
 
 Can Solr natively achieve something similar?  If not, can you suggest 
 a way to achieve this?  A custom RequestHandler?
 
 
 Thanks!
 --
 View this message in context:
 http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp20
 58
 7024p20587024.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

--
View this message in context:
http://www.nabble.com/No-search-result-behavior-%28a-la-Amazon%29-tp2058
7024p20587896.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Otis Gospodnetic
Glen:

$ ff \*Standard\*java | grep analysis
./src/java/org/apache/solr/analysis/HTMLStripStandardTokenizerFactory.java
./src/java/org/apache/solr/analysis/StandardFilterFactory.java
./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java


Does that do it?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Glen Newton [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, November 19, 2008 2:49:26 PM
Subject: Solr schema Lucene's StandardAnalyser equivalent?

Hello,

I am looking for the Solr schema equivalent to Lucene's StandardAnalyser.

Is it the Solr schema type:
fieldType name=text class=solr.TextField

Is there some way of directly invoking Lucene's StandardAnalyser?

Thanks,
Glen
-- 

-


Searchable/indexable newsgroups

2008-11-19 Thread John Martyniak
Does anybody know of a good way to index newsgroups using SOLR?   
Basically would like to build a searchable list of newsgroup content.


Any help would be greatly appreciated.

-John



Re: Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Glen Newton
Thanks.

I've decided to use:
 fieldType name=textN class=solr.TextField positionIncrementGap=100 

  analyzer

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.StandardFilterFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.StopFilterFactory/


  /analyzer

/fieldType


which appears to be close to what is found at
http://lucene.apache.org/java/2_3_1/api/index.html
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and
StopFilter, using a list of English stop words.

-Glen

2008/11/19 Otis Gospodnetic [EMAIL PROTECTED]:
 Glen:

 $ ff \*Standard\*java | grep analysis
 ./src/java/org/apache/solr/analysis/HTMLStripStandardTokenizerFactory.java
 ./src/java/org/apache/solr/analysis/StandardFilterFactory.java
 ./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java


 Does that do it?

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch




 
 From: Glen Newton [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, November 19, 2008 2:49:26 PM
 Subject: Solr schema Lucene's StandardAnalyser equivalent?

 Hello,

 I am looking for the Solr schema equivalent to Lucene's StandardAnalyser.

 Is it the Solr schema type:
 fieldType name=text class=solr.TextField

 Is there some way of directly invoking Lucene's StandardAnalyser?

 Thanks,
 Glen
 --

 -




-- 

-


RE: Searchable/indexable newsgroups

2008-11-19 Thread Feak, Todd
Can Nutch crawl newsgroups? Anyone?

-Todd Feak

-Original Message-
From: John Martyniak [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 3:06 PM
To: solr-user@lucene.apache.org
Subject: Searchable/indexable newsgroups

Does anybody know of a good way to index newsgroups using SOLR?   
Basically would like to build a searchable list of newsgroup content.

Any help would be greatly appreciated.

-John




Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Jon Baer

Hi,

I wanted to try the TermVectorComponent w/ current schema setup and I  
did a build off trunk but it's giving me something like ...


org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

Even though it is declared in schema.xml (lowercase), before I grep  
replace the entire file would that be my issue?


Thanks.

- Jon


Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Ryan McKinley

schema fields should be case sensitive...  so DOCTYPE != doctype

is the behavior different for you in 1.3 with the same file/schema?


On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:


Hi,

I wanted to try the TermVectorComponent w/ current schema setup and  
I did a build off trunk but it's giving me something like ...


org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

Even though it is declared in schema.xml (lowercase), before I grep  
replace the entire file would that be my issue?


Thanks.

- Jon




Re: Solr schema Lucene's StandardAnalyser equivalent?

2008-11-19 Thread Erik Hatcher
Note that you can use a standard Lucene Analyzer subclass too.  The  
example schema shows how with this commented out:


   fieldType name=text_greek class=solr.TextField
  analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/
/fieldType

  Erik



On Nov 19, 2008, at 6:24 PM, Glen Newton wrote:


Thanks.

I've decided to use:
fieldType name=textN class=solr.TextField  
positionIncrementGap=100 


 analyzer

   tokenizer class=solr.StandardTokenizerFactory/

   filter class=solr.StandardFilterFactory/

   filter class=solr.LowerCaseFilterFactory/

   filter class=solr.StopFilterFactory/


 /analyzer

   /fieldType


which appears to be close to what is found at
http://lucene.apache.org/java/2_3_1/api/index.html
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and
StopFilter, using a list of English stop words.

-Glen

2008/11/19 Otis Gospodnetic [EMAIL PROTECTED]:

Glen:

$ ff \*Standard\*java | grep analysis
./src/java/org/apache/solr/analysis/ 
HTMLStripStandardTokenizerFactory.java

./src/java/org/apache/solr/analysis/StandardFilterFactory.java
./src/java/org/apache/solr/analysis/StandardTokenizerFactory.java


Does that do it?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Glen Newton [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, November 19, 2008 2:49:26 PM
Subject: Solr schema Lucene's StandardAnalyser equivalent?

Hello,

I am looking for the Solr schema equivalent to Lucene's  
StandardAnalyser.


Is it the Solr schema type:
fieldType name=text class=solr.TextField

Is there some way of directly invoking Lucene's StandardAnalyser?

Thanks,
Glen
--

-





--

-




Re: Newbe! Trying to run solr-1.3.0 under tomcat. Please help

2008-11-19 Thread James liu
check procedure:
1: rm -r $tomcat/webapps/*
2: rm -r $solr/data ,,,ur index data directory
3: check xml(any xml u modified)
4: start tomcat

i had same error, but i forgot how to fix...so u can use my check procedure,
i think it will help you


i use tomcat+solr in win2003, freebsd, mac osx 10.5.5, they all work well

-- 
regards
j.L


Re: posting error in solr

2008-11-19 Thread James liu
first u sure the xml is utf-8,,and field value is utf-8,,
second u should post xml by utf-8


my advice : All encoding use utf-8...

it make my solr work well,,, i use chinese

-- 
regards
j.L


Tomcat undeploy/shutdown exception

2008-11-19 Thread Erik Hatcher
In analyzing a clients Solr logs, from Tomcat, I came across the  
exception below.  Anyone encountered issues with Tomcat shutdowns or  
undeploys of Solr contexts?  I'm not sure if this is an anomaly due to  
some wonky Tomcat handling, or if this is some kind of bug in Solr.  I  
haven't actually duplicated the issue myself though.


Thanks,
Erik


Oct 29, 2008 10:14:31 AM org.apache.catalina.startup.HostConfig  
undeployApps

WARNING: Error while removing context [/search]
java.lang.NullPointerException
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:123)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:253)
	at  
org 
.apache.catalina.core.StandardContext.filterStop(StandardContext.java: 
3670)
	at org.apache.catalina.core.StandardContext.stop(StandardContext.java: 
4354)
	at  
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java: 
893)
	at  
org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java: 
1191)

at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162)
	at  
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 
313)
	at  
org 
.apache 
.catalina 
.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)

at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1055)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1067)
	at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java: 
448)
	at org.apache.catalina.core.StandardService.stop(StandardService.java: 
510)
	at org.apache.catalina.core.StandardServer.stop(StandardServer.java: 
734)

at org.apache.catalina.startup.Catalina.stop(Catalina.java:602)
at org.apache.catalina.startup.Catalina.start(Catalina.java:577)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at  
sun 
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 
39)
	at  
sun 
.reflect 
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 
25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433)



Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Jon Baer
Sorry I should have mentioned this is from using the  
DataImportHandler ... it seems case insensitive ... ie my columns are  
UPPERCASE and schema field names are lowercase and it works fine in  
1.3 but not in 1.4 ... it seems strict.  Going to resolve all the  
field names to uppercase to see if that resolves the problem.  Thanks.


- Jon

On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:


schema fields should be case sensitive...  so DOCTYPE != doctype

is the behavior different for you in 1.3 with the same file/schema?


On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:


Hi,

I wanted to try the TermVectorComponent w/ current schema setup and  
I did a build off trunk but it's giving me something like ...


org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

Even though it is declared in schema.xml (lowercase), before I grep  
replace the entire file would that be my issue?


Thanks.

- Jon






Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
Hi John,
it could probably not the expected behavior?

only 'explicit' fields must be case-sensitive.

Could you tell me the usecase or can you paste the data-config?

--Noble






On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer [EMAIL PROTECTED] wrote:
 Sorry I should have mentioned this is from using the DataImportHandler ...
 it seems case insensitive ... ie my columns are UPPERCASE and schema field
 names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems
 strict.  Going to resolve all the field names to uppercase to see if that
 resolves the problem.  Thanks.

 - Jon

 On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:

 schema fields should be case sensitive...  so DOCTYPE != doctype

 is the behavior different for you in 1.3 with the same file/schema?


 On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:

 Hi,

 I wanted to try the TermVectorComponent w/ current schema setup and I did
 a build off trunk but it's giving me something like ...

 org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

 Even though it is declared in schema.xml (lowercase), before I grep
 replace the entire file would that be my issue?

 Thanks.

 - Jon






-- 
--Noble Paul


Re: DataImportHandler: Javascript transformer for splitting field-values

2008-11-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
unfortunately native JS objects are not handled by the ScriptTransformer yet.

but what you can do in the script is create a new
java.util.ArrayList() and add each item into that .

some thing like
var jsarr = ['term','term','term']
var arr = new java.util.ArrayList();
for each in jsarr... arr.add(item)
row.put('terms',arr);

On Wed, Nov 19, 2008 at 9:03 PM, Steffen [EMAIL PROTECTED] wrote:
 Hi everyone,
 I'm currently working with the nightly build of Solr (solr-2008-11-17)
 and trying to figure out how to transform a row-object with Javascript
 to include multiple values (in a single multivalued field). When I try
 something like this as a transformer:
 function splitTerms(row) {
//each term should be duplicated into count 
 field-values
//dummy-code to show the idea
row.put('terms',['term','term','term']);
return row;
 }
 [...]
 entity name=searchfeedback pk=id transformer=script:splitTerms
 query=SELECT term,count FROM termtable WHERE id=${parent.id} /

 The DataImportHandler debugger returns:
 arr
  str
sun.org.mozilla.javascript.internal.NativeArray:[EMAIL PROTECTED]
  /str
 /arr
 What it *should* return:
 arr
  strterm/str
  strterm/str
  strterm/str
 /arr

 So, what am I doing wrong? My transformer will be invoked multiple
 times from a MySQL-Query and in turn has to insert multiple values to
 the same field during each invocation. It should do something similar
 to the RegexTransformer (field splitBy)... is that possible? Right now
 I have to use a workaround that includes the term-duplication on the
 database sides, which is kinda ugly if a term has to be duplicated a
 lot.
 Greetings,
 Steffen




-- 
--Noble Paul


Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Jon Baer

Schema:
field name=docid type=string indexed=true stored=true/

DIH:
field column=DOCID name=docid template=PLAYER-$ 
{players.PLAYERID}/


The column is uppercase ... isn't there some automagic happening now  
where DIH will introspect the fields @ load time?


- Jon

On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



Hi John,
it could probably not the expected behavior?

only 'explicit' fields must be case-sensitive.

Could you tell me the usecase or can you paste the data-config?

--Noble






On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer [EMAIL PROTECTED] wrote:
Sorry I should have mentioned this is from using the  
DataImportHandler ...
it seems case insensitive ... ie my columns are UPPERCASE and  
schema field
names are lowercase and it works fine in 1.3 but not in 1.4 ... it  
seems
strict.  Going to resolve all the field names to uppercase to see  
if that

resolves the problem.  Thanks.

- Jon

On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:


schema fields should be case sensitive...  so DOCTYPE != doctype

is the behavior different for you in 1.3 with the same file/schema?


On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:


Hi,

I wanted to try the TermVectorComponent w/ current schema setup  
and I did

a build off trunk but it's giving me something like ...

org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

Even though it is declared in schema.xml (lowercase), before I grep
replace the entire file would that be my issue?

Thanks.

- Jon









--
--Noble Paul




Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Noble Paul നോബിള്‍ नोब्ळ्
So originally you had the field declaration as follows . right?
field column=DOCID template=PLAYER-${players.PLAYERID}/

we did some refactoring to minimize the object creation for
case-insensitive comparisons.

I guess it should be rectified soon.

Thanks for bringing it to our notice.
--Noble





On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer [EMAIL PROTECTED] wrote:
 Schema:
 field name=docid type=string indexed=true stored=true/

 DIH:
 field column=DOCID name=docid template=PLAYER-${players.PLAYERID}/

 The column is uppercase ... isn't there some automagic happening now where
 DIH will introspect the fields @ load time?

 - Jon

 On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 Hi John,
 it could probably not the expected behavior?

 only 'explicit' fields must be case-sensitive.

 Could you tell me the usecase or can you paste the data-config?

 --Noble






 On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer [EMAIL PROTECTED] wrote:

 Sorry I should have mentioned this is from using the DataImportHandler
 ...
 it seems case insensitive ... ie my columns are UPPERCASE and schema
 field
 names are lowercase and it works fine in 1.3 but not in 1.4 ... it seems
 strict.  Going to resolve all the field names to uppercase to see if that
 resolves the problem.  Thanks.

 - Jon

 On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:

 schema fields should be case sensitive...  so DOCTYPE != doctype

 is the behavior different for you in 1.3 with the same file/schema?


 On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:

 Hi,

 I wanted to try the TermVectorComponent w/ current schema setup and I
 did
 a build off trunk but it's giving me something like ...

 org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

 Even though it is declared in schema.xml (lowercase), before I grep
 replace the entire file would that be my issue?

 Thanks.

 - Jon






 --
 --Noble Paul





-- 
--Noble Paul


Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Jon Baer
Correct ... it is the unfortunate side effect of having some legacy  
tables in uppercase :-\  I thought the explicit declaration of field  
name attribute was ok.


- Jon

On Nov 19, 2008, at 11:53 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



So originally you had the field declaration as follows . right?
field column=DOCID template=PLAYER-${players.PLAYERID}/

we did some refactoring to minimize the object creation for
case-insensitive comparisons.

I guess it should be rectified soon.

Thanks for bringing it to our notice.
--Noble





On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer [EMAIL PROTECTED] wrote:

Schema:
field name=docid type=string indexed=true stored=true/

DIH:
field column=DOCID name=docid template=PLAYER-$ 
{players.PLAYERID}/


The column is uppercase ... isn't there some automagic happening  
now where

DIH will introspect the fields @ load time?

- Jon

On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



Hi John,
it could probably not the expected behavior?

only 'explicit' fields must be case-sensitive.

Could you tell me the usecase or can you paste the data-config?

--Noble






On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer [EMAIL PROTECTED] wrote:


Sorry I should have mentioned this is from using the  
DataImportHandler

...
it seems case insensitive ... ie my columns are UPPERCASE and  
schema

field
names are lowercase and it works fine in 1.3 but not in 1.4 ...  
it seems
strict.  Going to resolve all the field names to uppercase to see  
if that

resolves the problem.  Thanks.

- Jon

On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:


schema fields should be case sensitive...  so DOCTYPE != doctype

is the behavior different for you in 1.3 with the same file/ 
schema?



On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:


Hi,

I wanted to try the TermVectorComponent w/ current schema setup  
and I

did
a build off trunk but it's giving me something like ...

org.apache.solr.common.SolrException: ERROR:unknown field  
'DOCTYPE'


Even though it is declared in schema.xml (lowercase), before I  
grep

replace the entire file would that be my issue?

Thanks.

- Jon









--
--Noble Paul







--
--Noble Paul




Field collapsing (SOLR-236) and Solr 1.3.0 release version

2008-11-19 Thread Stephen Weiss

Hi,

A requirement has come up in a project where we're going to need to  
group by a field in the result set.  I looked into the SOLR-236 patch  
and it seems there are a couple versions out now that are supposed to  
work against the Solr 1.3.0 release.


This is a production site, it really can't be running anything that's  
going to crash or take up too many resources.  I wanted to check with  
the list and see if anyone is using this patch with the Solr 1.3.0  
release and if it is stable enough / performs well enough for serious  
usage.  We have an index of 3M+ documents and a grouped result set  
would be about 50-75% the total size of the ungrouped results.


Thanks for any information or pointers.

--
Steve Weiss
Stylesight


Re: Error in indexing timestamp format.

2008-11-19 Thread con


Hi Noble
Thanks for your update.

Sorry, that's a typo that I put same name for both soure and dest.
Actually i failed to removed it at some stage of trial and error.
I removed the copyfield as it is not fully necessary at this stage. 

My scenario is like:
I have various date fields in my database, with the format like: 22-10-08
03:57:11.63700 PM.
I want to index and search these dates. even after making the above updates
I am still not able to index or search these values. 

Expecting your reply
Thanks in advance
con





Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 could you explain me what is the purpose of this line?
 copyField source=CREATED_DATE dest=CREATED_DATE/
 
 I mean what are you trying to achive?
 Where did you get the documentation for copyField. may be I need to check
 it out
 
 On Wed, Nov 19, 2008 at 3:29 PM, con [EMAIL PROTECTED] wrote:


 Hi Nobble,
 Thank you very much
 That removed the error while server startup.

 But I don't think the data is getting indexed upon running the
 dataimport. I
 am unable to display the date field values on searching.
 This is my complete configs:

 entity name=employees
 transformer=TemplateTransformer,DateFormatTransformer pk=EMP_ID
 query=select EMP_ID, CREATED_DATE, CUST_ID FROM EMP, CUST where
 EMP.EMP_ID
 = CUST.EMP_ID 
   field column=rowtype template=employees /
   field column=EMP_ID name=EMP_ID /
   field column=CUST_ID name=CUST_ID /
   field column=CREATED_DATE sourceColName=CREATED_DATE
 dateTimeFormat=dd-MM-yy HH:mm:ss.S a /
 /entity

 In the schema.xml I have:
 field name=CREATED_DATE type=date indexed=true stored=true /

   copyField source=CREATED_DATE dest=CREATED_DATE/


 Do I need some other configurations.

 Thanks in advance
 con






 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 sorry I meant wrong dest field name

 On Wed, Nov 19, 2008 at 12:41 PM, con [EMAIL PROTECTED] wrote:

 Hi Nobble

 I have cross checked. This is my copy field of schema.xml

   copyField source=CREATED_DATE dest=date /

 I am still getting that error.

 thanks
 con



 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 yoour copyField has the wrong source field name . Field name is not
 date it is 'CREATED_DATE'

 On Wed, Nov 19, 2008 at 11:49 AM, con [EMAIL PROTECTED] wrote:

 Hi Shalin
 Please find the log data.

 10:18:30,819 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.servlet.SolrDispatchFilter init
 INFO: SolrDispatchFilter.init()
 10:18:30,838 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: No /solr/home in JNDI
 10:18:30,839 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader locateInstanceDir
 INFO: using system property solr.solr.home: C:\Search\solr
 10:18:30,844 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.CoreContainer$Initializer initialize
 INFO: looking for solr.xml: C:\Search\solr\solr.xml
 10:18:30,845 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to 'C:\Search\solr/'
 10:18:30,846 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr
 classloader
 10:18:30,847 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr
 classloader
 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader
 10:18:30,848 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to
 Solr
 classloader
 10:18:30,849 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/servlet-api-2.5-6.1.3.jar' to
 Solr
 classloader
 10:18:30,864 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.CoreContainer load
 INFO: loading shared library: C:\Search\solr\lib
 10:18:30,867 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-6.1.3.jar' to Solr
 classloader
 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jetty-util-6.1.3.jar' to Solr
 classloader
 10:18:30,870 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/jsp-2.1/' to Solr classloader
 10:18:30,871 ERROR [STDERR] 19 Nov, 2008 10:18:30 AM
 org.apache.solr.core.SolrResourceLoader createClassLoader
 INFO: Adding 'file:/C:/Search/solr/lib/ojdbc6-11.1.0.6.0.1.jar' to
 Solr
 classloader
 10:18:30,872 ERROR 

RE: Unique id

2008-11-19 Thread Raghunandan Rao
Basically, I am working on two views. First one has an ID column. The
second view has no unique ID column. What to do in such situations?
There are 3 other columns where I can make a composite key out of those.
I have to index these two views now. 


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 19, 2008 5:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Unique id

Technically, no, a uniqueKey field is NOT required.  I've yet to run  
into a situation where it made sense not to use one though.

As for indexing database tables - if one of your tables doesn't have a  
primary key, does it have an aggregate unique key of some sort?  Do  
you plan on updating the rows in that table and reindexing them?   
Seems like some kind of unique key would make sense for updating  
documents.

But yeah, a more detailed description of your table structure and  
searching needs would be helpful.

Erik


On Nov 19, 2008, at 5:18 AM, Aleksander M. Stensby wrote:

 Yes it is. You need a unique id because the add method works as and  
 add or update method. When adding a document whose ID is already  
 found in the index, the old document will be deleted and the new  
 will be added. Are you indexing two tables into the same index? Or  
 does one entry in the index consist of data from both tables? How  
 are these linked together without an ID?

 - Aleksander

 On Wed, 19 Nov 2008 10:42:00 +0100, Raghunandan Rao
[EMAIL PROTECTED] 
  wrote:

 Hi,

 Is the uniqueKey in schema.xml really required?


 Reason is, I am indexing two tables and I have id as unique key in
 schema.xml but id field is not there in one of the tables and  
 indexing
 fails. Do I really require this unique field for Solr to index it  
 better
 or can I do away with this?


 Thanks,

 Rahgu




 -- 
 Aleksander M. Stensby
 Senior software developer
 Integrasco A/S
 www.integrasco.no



Re: Tomcat undeploy/shutdown exception

2008-11-19 Thread Shalin Shekhar Mangar
Eric, which Solr version is that stack trace from?

On Thu, Nov 20, 2008 at 7:57 AM, Erik Hatcher [EMAIL PROTECTED]wrote:

 In analyzing a clients Solr logs, from Tomcat, I came across the exception
 below.  Anyone encountered issues with Tomcat shutdowns or undeploys of Solr
 contexts?  I'm not sure if this is an anomaly due to some wonky Tomcat
 handling, or if this is some kind of bug in Solr.  I haven't actually
 duplicated the issue myself though.

 Thanks,
Erik


 Oct 29, 2008 10:14:31 AM org.apache.catalina.startup.HostConfig
 undeployApps
 WARNING: Error while removing context [/search]
 java.lang.NullPointerException
at
 org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:123)
at
 org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:253)
at
 org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:3670)
at
 org.apache.catalina.core.StandardContext.stop(StandardContext.java:4354)
at
 org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:893)
at
 org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1191)
at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1162)
at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:313)
at
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120)
at
 org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1055)
at
 org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1067)
at
 org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:448)
at
 org.apache.catalina.core.StandardService.stop(StandardService.java:510)
at
 org.apache.catalina.core.StandardServer.stop(StandardServer.java:734)
at org.apache.catalina.startup.Catalina.stop(Catalina.java:602)
at org.apache.catalina.startup.Catalina.start(Catalina.java:577)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:433)




-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr schema 1.3 - 1.4-dev (changes?)

2008-11-19 Thread Shalin Shekhar Mangar
Jon, I just committed a fix for this issue at
https://issues.apache.org/jira/browse/SOLR-873

Can you please use trunk and see if it solved your problem?

On Thu, Nov 20, 2008 at 10:32 AM, Jon Baer [EMAIL PROTECTED] wrote:

 Correct ... it is the unfortunate side effect of having some legacy tables
 in uppercase :-\  I thought the explicit declaration of field name attribute
 was ok.

 - Jon


 On Nov 19, 2008, at 11:53 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

  So originally you had the field declaration as follows . right?
 field column=DOCID template=PLAYER-${players.PLAYERID}/

 we did some refactoring to minimize the object creation for
 case-insensitive comparisons.

 I guess it should be rectified soon.

 Thanks for bringing it to our notice.
 --Noble





 On Thu, Nov 20, 2008 at 10:05 AM, Jon Baer [EMAIL PROTECTED] wrote:

 Schema:
 field name=docid type=string indexed=true stored=true/

 DIH:
 field column=DOCID name=docid
 template=PLAYER-${players.PLAYERID}/

 The column is uppercase ... isn't there some automagic happening now
 where
 DIH will introspect the fields @ load time?

 - Jon

 On Nov 19, 2008, at 11:11 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

  Hi John,
 it could probably not the expected behavior?

 only 'explicit' fields must be case-sensitive.

 Could you tell me the usecase or can you paste the data-config?

 --Noble






 On Thu, Nov 20, 2008 at 8:55 AM, Jon Baer [EMAIL PROTECTED] wrote:


 Sorry I should have mentioned this is from using the DataImportHandler
 ...
 it seems case insensitive ... ie my columns are UPPERCASE and schema
 field
 names are lowercase and it works fine in 1.3 but not in 1.4 ... it
 seems
 strict.  Going to resolve all the field names to uppercase to see if
 that
 resolves the problem.  Thanks.

 - Jon

 On Nov 19, 2008, at 6:44 PM, Ryan McKinley wrote:

  schema fields should be case sensitive...  so DOCTYPE != doctype

 is the behavior different for you in 1.3 with the same file/schema?


 On Nov 19, 2008, at 6:26 PM, Jon Baer wrote:

  Hi,

 I wanted to try the TermVectorComponent w/ current schema setup and I
 did
 a build off trunk but it's giving me something like ...

 org.apache.solr.common.SolrException: ERROR:unknown field 'DOCTYPE'

 Even though it is declared in schema.xml (lowercase), before I grep
 replace the entire file would that be my issue?

 Thanks.

 - Jon







 --
 --Noble Paul






 --
 --Noble Paul





-- 
Regards,
Shalin Shekhar Mangar.