parsing many documents takes too long

2011-08-12 Thread Tri Nguyen
Hi,
 
My results from solr returns about 982 documents and I use jaxb to parse them 
into java objects, which takes about 469 ms, which is over my 150-200ms 
threshold.
 
Is there a solution around this?  Can I store the java objects in the index and 
return them in the solr response and then serialize them back into java 
objects?  Would this take less time?
 
Any other ideas?
 
Thanks,
 
Tri

sorting distance in solr 1.4.1

2011-08-12 Thread Tri Nguyen
Hi,
 
We are using solr 1.4.1 and we need to sort our results by distance. We have 
lat lons for each document in the response and our reference point.
 
Is it possible?  I read about the spatial plugin but the does range searching:
 
http://blog.jayway.com/2010/10/27/geo-search-with-spatial-solr-plugin/
 
Doesn't talk about sorting the results by distance (as supported by solr 3.1).
 
Tri

class not found

2011-04-07 Thread Tri Nguyen
Hi,

I wrote my own parser plugin.

I'm getting a NoClassCefFoundError.  Any ideas why?

Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.solr.search.QParserPlugin
    at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
    at org.apache.solr.core.SolrCore.init(SolrCore.java:548)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
    at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
    at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)

    at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
    at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)

    at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)

    at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)

    at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
    at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
    at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
    at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
    at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
    at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
    at 
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
    at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
    at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
    at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
    at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)

    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
    at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
    at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
    at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
    at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
    at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

Tri

Re: class not found

2011-04-07 Thread Tri Nguyen
yes.





From: Ahmet Arslan iori...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:23:56 PM
Subject: Re: class not found

 I wrote my own parser plugin.
 
 I'm getting a NoClassCefFoundError.  Any ideas why?

Did you put jar file - that contains you custom code - into /lib directory?
http://wiki.apache.org/solr/SolrPlugins


Re: class not found

2011-04-07 Thread Tri Nguyen
The jar containing the class is in here:

/usr/local/apache-tomcat-6.0.20/webapps/solr/WEB-INF/lib

for my setup.

Tri





From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, April 7, 2011 3:24:14 PM
Subject: Re: class not found

Can you give us some more details? I suspect the jar file containing
your plugin isn't in the Solr lib directory and/or you don't have a lib
directive in your solrconfig.xml file pointing to where your jar is.

But that's a guess since you haven't provided any information about
what you did to try to use your plugin, like how you deployed it, how
you compiled it, how

Best
Erick

On Thu, Apr 7, 2011 at 4:43 PM, Tri Nguyen tringuye...@yahoo.com wrote:

 Hi,

 I wrote my own parser plugin.

 I'm getting a NoClassCefFoundError.  Any ideas why?

 Apr 7, 2011 1:12:43 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NoClassDefFoundError: Could not initialize class
 org.apache.solr.search.QParserPlugin
        at org.apache.solr.core.SolrCore.initQParsers(SolrCore.java:1444)
        at org.apache.solr.core.SolrCore.init(SolrCore.java:548)
        at
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
        at

org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
)

        at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
        at

org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
)

        at

org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
)

        at

org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
)

        at

 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
        at
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
        at

 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
        at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
        at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
        at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:850)
        at
 org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:724)
        at
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:493)
        at
 org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
        at
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
        at

org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
)

        at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
        at
 org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
        at
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
        at
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
        at
 org.apache.catalina.core.StandardService.start(StandardService.java:516)
        at
 org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
)

        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

 Tri


Re: adding a TimerTask

2011-02-19 Thread Tri Nguyen
Seems like one way is to write a servlet who's init method creates a TimerTask.





From: Tri Nguyen tringuye...@yahoo.com
To: solr user solr-user@lucene.apache.org
Sent: Fri, February 18, 2011 6:02:44 PM
Subject: adding a TimerTask

Hi,

How can I add a TimerTask to Solr?

Tri

Re: slave out of sync

2011-02-19 Thread Tri Nguyen
there is an http api where I can look at the latest replication and whether 
there is an ERROR keyword.  If so, the latest replication failed.





From: Otis Gospodnetic otis_gospodne...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, February 16, 2011 11:31:26 AM
Subject: Re: slave out of sync

Hi Tri,

You could look at the stats page for each slave and compare the number of docs 
in them.  The one(s) that are off from the rest/majority are out of sync.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Tri Nguyen tringuye...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Mon, February 14, 2011 7:19:58 PM
 Subject: slave out of sync
 
 Hi,
 
 We're thinking of having a master-slave configuration where there are  
 multiple 


 slaves.  Let's say during replication, one of the slaves does not  replicate 
 properly.
 
 How will we dectect that the 1 slave is out of  sync?
 
 Tri


adding a TimerTask

2011-02-18 Thread Tri Nguyen
Hi,

How can I add a TimerTask to Solr?

Tri

Re: rollback to other versions of index

2011-02-15 Thread Tri Nguyen
Hi,

Wanted to explain my situation in more detail.

I have a master which never adds or deletes documents incrementally.  I just 
run 
the dataimport with autocommit.

Seems like I'll need to make a custom DeletionPolicy to keep more than one 
index 
around.

I'm accessing indices from Solr.  How do I tell solr to use a particular index?

Thanks,

Tri





From: Michael McCandless luc...@mikemccandless.com
To: solr-user@lucene.apache.org
Sent: Tue, February 15, 2011 5:36:49 AM
Subject: Re: rollback to other versions of index

Lucene is able to do this, if you make a custom DeletionPolicy (which
controls when commit points are deleted).

By default Lucene only saves the most recent commit
(KeepOnlyLastCommitDeletionPolicy), but if your policy keeps more
around, then you can open an IndexReader or IndexWriter on any
IndexCommit.

Any changes (including optimize, and even opening a new IW with
create=true) are safe within a commit; Lucene is fully transactional.

For example, I use this for benchmarking: I save 4 commit points in a
single index.  First is a multi-segment index, second is the same
index with 5% deletions, third is an optimized index, and fourth is
the optimized index with 5% deletions.  This gives me a single index
w/ 4 different commit points, so I can then benchmark searching
against any of those 4.

Mike

On Tue, Feb 15, 2011 at 4:43 AM, Jan Høydahl jan@cominvent.com wrote:
 Yes and no. The index grows like an onion adding new segments for each commit.
 There is no API to remove the newly added segments, but I guess you could 
 hack 
something.
 The other problem is that as soon as you trigger an optimize() all history is 
gone as the segments are merged into one. Optimize normally happens 
automatically behind the scenes. You could turn off merging but that will 
badly 
hurt your performance after some time and ultimately crash your OS.

 Since you only need a few versions back, you COULD write your own custom 
mergePolicy, always preserving at least N versions. But beware that a 
version 
may be ONE document or 1 documents, depending on how you commit or if 
autoCommit is active. so if you go this route you also need strict control 
over 
your commits.

 Perhaps best option is to handle this on feeding client side, where you keep 
 a 
buffer of N last docs. Then you can freely roll back or re-index as you 
choose, 
based on time, number of docs etc.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 15. feb. 2011, at 01.21, Tri Nguyen wrote:

 Hi,

 Does solr version each index build?

 We'd like to be able to rollback to not just a previous version but maybe a 
few
 version before the current one.

 Thanks,

 Tri




slave out of sync

2011-02-14 Thread Tri Nguyen
Hi,

We're thinking of having a master-slave configuration where there are multiple 
slaves.  Let's say during replication, one of the slaves does not replicate 
properly.

How will we dectect that the 1 slave is out of sync?

Tri

rollback to other versions of index

2011-02-14 Thread Tri Nguyen
Hi,

Does solr version each index build?  

We'd like to be able to rollback to not just a previous version but maybe a few 
version before the current one.

Thanks,

Tri

running optimize on master

2011-02-10 Thread Tri Nguyen
Hi,

I've read running optimize is similar to running defrag on a hard disk.  
Deleted 
docs are removed and segments are reorganized for faster searching.

I have a couple questions.

Is optimize necessary if  I never delete documents?  I build the index every 
hour but we don't delete in between builds.

Secondly, what kind of reorganizing of segments is done to make searches faster?

Thanks,

Tri

Re: running optimize on master

2011-02-10 Thread Tri Nguyen
Does optimize merge all segments into 1 segment on the master after the build?

Or after the build, there's only 1 segment.

thanks,

Tri





From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Thu, February 10, 2011 5:08:44 PM
Subject: Re: running optimize on master

Optimizing isn't necessary in your scenario, as you don't delete
documents and rebuild the whole thing each time anyway.

As for faster searches, this has been largely been made obsolete
by recent changes in how indexes are built in the first place. Especially
as you can build your index in an hour, it's likely not big enough to
benefit from optimizing even under the old scenario

So, unless you have some evidence that your queries are performing
poorly, I would just leave the optimize step off.

Best
Erick


On Thu, Feb 10, 2011 at 7:09 PM, Tri Nguyen tringuye...@yahoo.com wrote:
 Hi,

 I've read running optimize is similar to running defrag on a hard disk.  
Deleted
 docs are removed and segments are reorganized for faster searching.

 I have a couple questions.

 Is optimize necessary if  I never delete documents?  I build the index every
 hour but we don't delete in between builds.

 Secondly, what kind of reorganizing of segments is done to make searches 
faster?

 Thanks,

 Tri


solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Hi,

I have a class (in a jar) that reads from properties (text) files.  I have 
these 
files in the same jar file as the class.

However, when my class reads those properties files, those files cannot be 
found 
since solr reads from tomcat's bin directory.

I don't really want to put the config files in tomcat's bin directory.

How do I reconcile this?

Tri

pre and post processing when building index

2011-02-09 Thread Tri Nguyen
Hi,

I'm scheduling solr to build every hour or so.

I'd like to do some pre and post processing for each index build.  The 
preprocessing would do some checks and perhaps will skip the build.

For post processing, I will do some checks and either commit or rollback the 
build.

Can I write some class and plugin into solr for this?

Thanks,

Tri

communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
Hi,

I'd like to communicate errors between my entity processor and the DataImporter 
in case of error.

Should there be an error in my entity processor, I'd like the index build to 
rollback. How can I do this?

I want to throw an exception of some sort.  Only thing I can think of is to 
force a runtime exception be thrown in nextRow() of the entityprocessor since 
runtime exceptions are not checked and does not have to be declared in the 
nextRow() method signature.

How can I request the nextRow() method signature be updated to throw 
Exception?  
Would it even make sense?

Tri

Re: solr current workding directory or reading config files

2011-02-09 Thread Tri Nguyen
Wanted to add some more details to my problem.  I have many jars that have 
their 
own config files.  So I'd have to copy files for every jar.  Can solr read from 
the classpath (jar files)?

Yes my war is always deployed to the same location under webapps.  I do already 
have solr/home defined in web.xml.  I'll try copying my files into there, but I 
would have to extract every jar file and do this manually.





From: Wilkes, Chris cwil...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, February 9, 2011 3:44:03 PM
Subject: Re: solr current workding directory or reading config files

Is your war always deployed the the same location, ie 
/usr/mycomp/myapplication/webapps/myapp.war?  If so then on startup copy the 
files out of your directory and put them under CATALINA_BASE/solr 
(usr/mycomp/myapplication/solr) and in your war file have the 
META-INF/context.xml JNDI setting point to that.

Context
  Environment name=solr/home type=java.lang.String 
value=/usr/mycomp/myapplication/solr override=true /
/Context

If you know of a way to reference CATALINA_BASE in the context.xml that would 
make it easier.

On Feb 9, 2011, at 12:00 PM, Tri Nguyen wrote:

 Hi,
 
 I have a class (in a jar) that reads from properties (text) files.  I have 
these
 files in the same jar file as the class.
 
 However, when my class reads those properties files, those files cannot be 
found
 since solr reads from tomcat's bin directory.
 
 I don't really want to put the config files in tomcat's bin directory.
 
 How do I reconcile this?
 
 Tri

Re: communication between entity processor and solr DataImporter

2011-02-09 Thread Tri Nguyen
I can throw DataImportHandlerException (a runtime exception) from my 
entityprocessor which will force a rollback.

Tri





From: Tri Nguyen tringuye...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, February 9, 2011 3:50:05 PM
Subject: communication between entity processor and solr DataImporter

Hi,

I'd like to communicate errors between my entity processor and the DataImporter 
in case of error.

Should there be an error in my entity processor, I'd like the index build to 
rollback. How can I do this?

I want to throw an exception of some sort.  Only thing I can think of is to 
force a runtime exception be thrown in nextRow() of the entityprocessor since 
runtime exceptions are not checked and does not have to be declared in the 
nextRow() method signature.

How can I request the nextRow() method signature be updated to throw 
Exception?  

Would it even make sense?

Tri

response when using my own QParserPlugin

2011-02-03 Thread Tri Nguyen
Hi,

I wrote a QParserPlugin.  When I hit solr and use this QParserPlugin, the 
response does not have the column names associated with the data such as:

0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 
500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 
faketn1 
faketn1 faketn1 faketn1 faketn1 99902837 
+3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 
fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 
-122.491240 boldMain Dishes/bold boldPancakes/bold faketn1 2.99 Enjoy 
best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 
ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 2027 - 



How do I get the data to be associated with the index columns so I can parse it 
and know the context of the data (such as this data is the business name, this 
data is the address, etc).

---


i was hoping it return something like this or some sort of structure.

?xml version=1.0 encoding=UTF-8 ? 
- response
- lstname=responseHeader
  intname=status0/int 
  intname=QTime1/int 
- lstname=params
  strname=indenton/str 
  strname=start0/str 
  strname=qI_NAME_EXACT:faketn1/str 
  strname=rows10/str 
  strname=version2.2/str 
  /lst
  /lst
- resultname=responsenumFound=1start=0
- doc
- arrname=I_BASE_ID
  str-/str 
  str-/str 
  /arr
  strname=I_BLOCK_CATEGORY_ID495,496,497/str 
  strname=I_CATEGORY_ID500,657,498,499/str 
  strname=I_CITY_DISTRICTus:ca:san francisco/str 
  strname=I_KEYWORDfaketn,fakeregression/str 
  strname=I_LAT_RANGE037.74/str 
  strname=I_LON_RANGE-122.49/str 
  strname=I_NAME_AS_KEYWORDfaketn1/str 
  strname=I_NAME_ENUMfaketn1/str 
  strname=I_NAME_EXACTfaketn1/str 
  strname=I_NAME_NGRAMfaketn1/str 
  strname=I_NAME_PACKfaketn1/str 
  strname=I_POI_ID99902837/str 
  strname=I_SPATIAL_BLOCK+3774-12250|+3774-12250@1|+3772-12252@2/str 
  strname=I_ZIP_DISTRICT94116:us/str 
  strname=S_BLOCK_CATEGORY_ID495,496,497/str 
  strname=S_BLOCK_KEYWORDSfakecs,fakeatti,fakevenable/str 
  strname=S_CATEGORY_ID500,657,498,499/str 
  strname=S_CITYSan Francisco/str 
  strname=S_COMPAIGN_ID667/str 
  strname=S_COUNTRYUS/str 
  str name=S_FAX/ 
  strname=S_LATITUDE37.742369/str 
  strname=S_LONGTITUDE-122.491240/str 
  strname=S_MENUboldMain Dishes/bold boldPancakes/bold faketn1 
2.99/str 

  strname=S_MERCHANT_CONTENTEnjoy best chinese food./str 
  strname=S_NAMEfaketn1/str 
  strname=S_OFFERS1;0:0:0:0:8:20% off.0:0:0:3:0.0/str 
  strname=S_PHONE_NUMBER4158281775/str 
  strname=S_POSTALCODE94116/str 
  strname=S_PRICEMODEACTION_MODEL/str 
  strname=S_SOURCE_NAMETN/str 
  str name=S_SPONSOREDTEXT/ 
  strname=S_STATECA/str 
  strname=S_STREET2350 Taraval St/str 
  str name=S_STREET2/ 
  str name=S_SUIT/ 
  strname=S_TAGLINEEnjoy best chinese food/str 
  strname=S_TARGET_DISTANCE_IN_METER40233/str 
  strname=S_TA_ID-/str 
  
strname=S_USER_ACTIONS5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3/str
 
  strname=S_VENDOR_ID2027/str 
  str name=S_WEBURL/ 
  strname=S_YPC_ID-/str 
  /doc
  /result
  /response
 
Tri

performance during index switch

2011-01-19 Thread Tri Nguyen
Hi,
 
Are there performance issues during the index switch?
 
As the size of index gets bigger, response time slows down?  Are there any 
studies on this?
 
Thanks,
 
Tri

Re: performance during index switch

2011-01-19 Thread Tri Nguyen
Yes, during a commit.
 
I'm planning to do as you suggested, having a master do the indexing and 
replicating the index to a slave which leads to my next questions.
 
During the slave replicates the index files from the master, how does it impact 
performance on the slave?
 
Tri


--- On Wed, 1/19/11, Jonathan Rochkind rochk...@jhu.edu wrote:


From: Jonathan Rochkind rochk...@jhu.edu
Subject: Re: performance during index switch
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, January 19, 2011, 11:30 AM


During commit?

A commit (and especially an optimize) can be expensive in terms of both CPU and 
RAM as your index grows larger, leaving less CPU for querying, and possibly 
less RAM which can cause Java GC slowdowns in some cases.

A common suggestion is to use Solr replication to seperate out a Solr index 
that you index to, and then replicate to a slave index that actually serves 
your queries. This should minimize any performance problems on your 'live' Solr 
while indexing, although there's still something that has to be done for the 
actual replication of course. Haven't tried it yet myself.  Plan to -- my plan 
is actually to put them both on the same server (I've only got one), but in 
seperate JVMs, and on a server with enough CPU cores that hopefully the 
indexing won't steal CPU the querying needs.

On 1/19/2011 2:23 PM, Tri Nguyen wrote:
 Hi,
   Are there performance issues during the index switch?
   As the size of index gets bigger, response time slows down?  Are there any 
studies on this?
   Thanks,
   Tri


Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException

2011-01-18 Thread Tri Nguyen
what's the alternative?

--- On Tue, 1/18/11, Erick Erickson erickerick...@gmail.com wrote:


From: Erick Erickson erickerick...@gmail.com
Subject: Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException
To: solr-user@lucene.apache.org
Date: Tuesday, January 18, 2011, 5:24 AM


Why do you want to do this? Because toString has never been
guaranteed to be re-parsable, even in Lucene, so it's not
surprising that taking a Lucene toString() clause and submitting
it to Solr doesn't work.

Best
Erick

On Tue, Jan 18, 2011 at 4:49 AM, kun xiong xiongku...@gmail.com wrote:

 -- Forwarded message --
 From: kun xiong xiongku...@gmail.com
 Date: 2011/1/18
 Subject: HTTP Status 400 - org.apache.lucene.queryParser.ParseException
 To: solr-user@lucene.apache.org


 Hi all,
  I got a ParseException when I query solr with Lucene BooleanQuery
 expression (toString()).

 I use the default parser : LuceneQParserPlugin,which should support whole
 lucene syntax,right?

 Java Code:

 BooleanQuery bq = new BooleanQuery();
 Query q1 = new TermQuery(new Term(I_NAME_ENUM, KFC));
  Query q2 = new TermQuery(new Term(I_NAME_ENUM, MCD));
 bq.add(q1, Occur.SHOULD);
  bq.add(q2, Occur.SHOULD);
 bq.setMinimumNumberShouldMatch(1);
 String solrQuery = bq.toString();

 query string is : q=(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1

 Exceptions :

 *message* *org.apache.lucene.queryParser.ParseException: Cannot parse
 '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered  FUZZY_SLOP
 ~1  at line 1, column 42. Was expecting one of: EOF AND ... OR ...
 NOT ... + ... - ... ( ... * ... ^ ... QUOTED ... TERM ...
 PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... *

 *description* *The request sent by the client was syntactically incorrect
 (org.apache.lucene.queryParser.ParseException: Cannot parse
 '(I_NAME_ENUM:kfc I_NAME_ENUM:best western)~1': Encountered  FUZZY_SLOP
 ~1  at line 1, column 42. Was expecting one of: EOF AND ... OR ...
 NOT ... + ... - ... ( ... * ... ^ ... QUOTED ... TERM ...
 PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... ).*

 *
 *

 Anyone could help?


 Thanks

 Kun

 *
 *



using dismax

2011-01-18 Thread Tri Nguyen
Hi,
 
Maybe I'm missing something obvious.
 
I'm trying to use the dismax parser and it doesn't seem like I'm using it 
properly.
 
When I do this:
http://localhost:8080/solr/cs/select?q=(poi_id:3)
 
I get a row returned.
 
When I incorporate dismax and say mm=1, no results get returned.
http://localhost:8080/solr/cs/select?q=(poi_id:3)defType=dismaxmm=1
 
What I wanted to do when I specify mm=1 is to say at least 1 query parameter 
matches.
 
What am I missing?
 
Thanks,
 
Tri

abort data import on errors

2011-01-04 Thread Tri Nguyen
Hi,
 
Is there a way to specify to abort (rollback) the data import should there be 
an error/exception?
 
If everything runs smoothly, commit the data import.
 
Thanks,
 
Tri

Re: abort data import on errors

2011-01-04 Thread Tri Nguyen
I didn't want to issue the rollback command but have solr automatically detect 
exceptions and rollback should there be exceptions.
 
Probably there's an attribute I can configure to specify this for solr to 
understand.
 
Tri

--- On Tue, 1/4/11, Markus Jelsma markus.jel...@openindex.io wrote:


From: Markus Jelsma markus.jel...@openindex.io
Subject: Re: abort data import on errors
To: solr-user@lucene.apache.org
Date: Tuesday, January 4, 2011, 4:57 PM


http://wiki.apache.org/solr/UpdateXmlMessages#A.22rollback.22

 Hi,
  
 Is there a way to specify to abort (rollback) the data import should there
 be an error/exception? 
 If everything runs smoothly, commit the data import.
  
 Thanks,
  
 Tri


solr benchmarks

2010-12-31 Thread Tri Nguyen
Hi,
 
I remember going through some page that had graphs of response times based on 
index size for solr.
 
Anyone know of such pages?
 
Internally, we have some requirements for response times and I'm trying to 
figure out when to shard the index.
 
Thanks,
 
Tri

exception obtaining write lock on startup

2010-12-30 Thread Tri Nguyen
Hi,
 
I'm getting this exception when I have 2 cores as masters.  Seems like one of 
the cores obtains a lock (file) and then the other tries to obtain the same 
one.   However, the first one is not deleted.
 
How do I fix this?
 
Dec 30, 2010 4:34:48 PM org.apache.solr.handler.ReplicationHandler inform
WARNING: Unable to get IndexCommit on startup
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: Native
fsl...@..\webapps\solr\tnsolr\data\index\lucene-fe3fc928a4bbfeb55082e49b32a70c10
-write.lock
    at org.apache.lucene.store.Lock.obtain(Lock.java:85)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1565)
    at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1421)
    at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:19
1)
    at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHand
ler.java:98)
    at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHa
ndler2.java:173)
    at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpd
ateHandler2.java:376)
    at org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.
 
 
Tri

Re: shard versus core

2010-12-20 Thread Tri Nguyen
Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where 
it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson erickerick...@gmail.com wrote:


From: Erick Erickson erickerick...@gmail.com
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple virtual Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen tringuye...@yahoo.com wrote:

 Hi,

 Was wondering about  the pro's and con's of using sharding versus cores.

 An index can be split up to multiple cores or multilple shards.

 So why one over the other?

 Thanks,


 tri


Re: shard versus core

2010-12-20 Thread Tri Nguyen
Thought about it some more and after some reading.  I suppose the answer 
depends on what kind of response time is expected to be good enough.
 
I can do some stress testing and see if disk i/o is the bottleneck as the index 
grows.  I can also look into optimizing/configuring solr parameters to help 
performance.  One thing I've read is my disk should be at least 2 times the 
index.
 
 


--- On Mon, 12/20/10, Tri Nguyen tringuye...@yahoo.com wrote:


From: Tri Nguyen tringuye...@yahoo.com
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Monday, December 20, 2010, 4:04 AM


Hi Erick,
 
Thanks for the explanation.
 
At which point does the index get too big where sharding is appropriate where 
it affects performance?
 
Tri

--- On Sun, 12/19/10, Erick Erickson erickerick...@gmail.com wrote:


From: Erick Erickson erickerick...@gmail.com
Subject: Re: shard versus core
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 7:36 AM


Well, they can be different beasts. First of all, different cores can have
different schemas, which is not true of shards. Also, shards are almost
assumed to be running on different machines as a scaling technique,
whereas it multiple cores are run on a single Solr instance.

So using multiple cores is very similar to running multiple virtual Solr
serves on a single machine, each independent of the other. This can make
sense if, for instance, you wanted to have a bunch of small indexes all
on one machine. You could use multiple cores rather than multiple
instances of Solr. These indexes may or may not have anything to do with
each other.

Sharding, on the other hand, is almost always used to split a single logical
index up amongst multiple machines in order to improve performance. The
assumption usually is that the index is too big to give satisfactory
performance
on a single machine, so you'll split it into parts. That assumption really
implies that it makes no sense to put multiple shards on the #same# machine.

So really, the answer to your question is that you choose the right
technique
for the problem you're trying to solve. They aren't really different
solutions to
the same problem...

Hope this helps.
Erick

On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen tringuye...@yahoo.com wrote:

 Hi,

 Was wondering about  the pro's and con's of using sharding versus cores.

 An index can be split up to multiple cores or multilple shards.

 So why one over the other?

 Thanks,


 tri


master master, repeaters

2010-12-19 Thread Tri Nguyen
Hi,

In the master-slave configuration, I'm trying to figure out how to configure 
the 
system setup for master failover.

Does solr support master-master setup?  From my readings, solr does not.

I've read about repeaters as well where the slave can act as a master.  When 
the 
main master goes down, do the other slaves switch to the repeater?

Barring better solutions, I'm thinking about putting 2 masters behind  a load 
balancer.

If this is not implemented already, perhaps solr can be updated to support a 
list of masters for fault tolerance.

Tri

shard versus core

2010-12-19 Thread Tri Nguyen
Hi,

Was wondering about  the pro's and con's of using sharding versus cores.

An index can be split up to multiple cores or multilple shards.

So why one over the other?

Thanks,


tri

Re: master master, repeaters

2010-12-19 Thread Tri Nguyen
How do we tell the slaves to point to the new master without modifying the 
config files?  Can we do this while the slave is up, issuing a command to it?
 
Thanks,
 
Tri

--- On Sun, 12/19/10, Upayavira u...@odoko.co.uk wrote:


From: Upayavira u...@odoko.co.uk
Subject: Re: master master, repeaters
To: solr-user@lucene.apache.org
Date: Sunday, December 19, 2010, 10:13 AM


We had a (short) thread on this late last week. 

Solr doesn't support automatic failover of the master, at least in
1.4.1. I've been discussing with my colleague (Tommaso) about ways to
achieve this.

There's ways we could 'fake it', scripting the following:

* set up a 'backup' master, as a replica of the actual master
* monitor the master for 'up-ness'
* if it fails:
   * tell the master to start indexing to the backup instead
   * tell the slave(s) to connect to a different master (the backup)
* then, when the master is back:
   * wipe its index (backing up dir first?)
   * configure it to be a backup of the new master
   * make it pull a fresh index over

But, Jan Høydahl suggested using SolrCloud. I'm going to follow up on
how that might work in that thread.

Upayavira


On Sun, 19 Dec 2010 00:20 -0800, Tri Nguyen tringuye...@yahoo.com
wrote:
 Hi,
 
 In the master-slave configuration, I'm trying to figure out how to
 configure the 
 system setup for master failover.
 
 Does solr support master-master setup?  From my readings, solr does not.
 
 I've read about repeaters as well where the slave can act as a master. 
 When the 
 main master goes down, do the other slaves switch to the repeater?
 
 Barring better solutions, I'm thinking about putting 2 masters behind  a
 load 
 balancer.
 
 If this is not implemented already, perhaps solr can be updated to
 support a 
 list of masters for fault tolerance.
 
 Tri


solr immediate response on data import

2010-12-09 Thread Tri Nguyen
Hi,

I do a data import with commit=false.  I get the response back saying it's idle 
and 


Total number of rows skipped = -1 
Total number of rows processed = -1

This is the very first time after i start solr.  Subsequent times it doesn't 
return -1 but the rows it read from the datasource.

Why does it return -1?

And how would I interpret this?  Did the dataimport fail?

thank,

Tri

customer ping response

2010-12-07 Thread Tri Nguyen
Can I have a custom xml response for the ping request?

thanks,

Tri

Re: customer ping response

2010-12-07 Thread Tri Nguyen
I need to return this:

?xml version=1.0 encoding=UTF-8?
admin
status
nameServer/name
valueok/value
/status
/admin





From: Markus Jelsma markus.jel...@openindex.io
To: solr-user@lucene.apache.org
Cc: Tri Nguyen tringuye...@yahoo.com
Sent: Tue, December 7, 2010 4:27:32 PM
Subject: Re: customer ping response

Of course! The ping request handler behaves like any other request handler and 
accepts at last the wt parameter [1]. Use xslt [2] to transform the output to 
any desirable form or use other response writers [1].

Why anyway, is it a load balancer that only wants an OK output or something?

[1]: http://wiki.apache.org/solr/CoreQueryParameters
[2]: http://wiki.apache.org/solr/XsltResponseWriter
[3]: http://wiki.apache.org/solr/QueryResponseWriter
 Can I have a custom xml response for the ping request?
 
 thanks,
 
 Tri


Re: customer ping response

2010-12-07 Thread Tri Nguyen
Hi,

I'm reading the wiki.

What does q=apache mean in the url?

http://localhost:8983/solr/select/?stylesheet=q=apachewt=xslttr=example.xsl

thanks,

tri

 




From: Markus Jelsma markus.jel...@openindex.io
To: Tri Nguyen tringuye...@yahoo.com
Cc: solr-user@lucene.apache.org
Sent: Tue, December 7, 2010 4:35:28 PM
Subject: Re: customer ping response

Well, you can go a long way with xslt but i wouldn't know how to embed the 
server name in the response as Solr simply doesn't return that information.

You'd have to patch the response Solr's giving or put a small script in front 
that can embed the server name.

 I need to return this:
 
 ?xml version=1.0 encoding=UTF-8?
 admin
 status
 nameServer/name
 valueok/value
 /status
 /admin
 
 
 
 
 
 From: Markus Jelsma markus.jel...@openindex.io
 To: solr-user@lucene.apache.org
 Cc: Tri Nguyen tringuye...@yahoo.com
 Sent: Tue, December 7, 2010 4:27:32 PM
 Subject: Re: customer ping response
 
 Of course! The ping request handler behaves like any other request handler
 and accepts at last the wt parameter [1]. Use xslt [2] to transform the
 output to any desirable form or use other response writers [1].
 
 Why anyway, is it a load balancer that only wants an OK output or
 something?
 
 [1]: http://wiki.apache.org/solr/CoreQueryParameters
 [2]: http://wiki.apache.org/solr/XsltResponseWriter
 [3]: http://wiki.apache.org/solr/QueryResponseWriter
 
  Can I have a custom xml response for the ping request?
  
  thanks,
  
  Tri


dataimports response returns before done?

2010-12-03 Thread Tri Nguyen
Hi,
 
After issueing a dataimport, I've noticed solr returns a response prior to 
finishing the import. Is this correct?   Is there anyway i can make solr not 
return until it finishes?
 
If not, how do I ping for the status whether it finished or not?
 
thanks,
 
tri

sorl response xsd

2010-11-22 Thread Tri Nguyen
Hi,
 
I'm trying to look for the solr response xsd.
 
Is this it here?
 
https://issues.apache.org/jira/browse/SOLR-17
 
I'd basically want to know if the data import passed or failed.  I can get the 
xml string and search for completed, but would wondering if I can use and xsd 
to parse the response.
 
Or is there another way?
 
Here's the response I have and I don't see in the xsd the lst element for 
statusMessages.
 
xml version=1.0 encoding=UTF-8 ? 

- response


+ lst name=responseHeader


  int name=status0/int 

  int name=QTime15/int 
  /lst

+ lst name=initArgs


- lst name=defaults


  str name=configdata-config.xml/str 
  /lst
  /lst

  str name=commandfull-import/str 

  str name=statusidle/str 

  str name=importResponse / 

- lst name=statusMessages


  str name=Total Requests made to DataSource0/str 

  str name=Total Rows Fetched0/str 

  str name=Total Documents Skipped0/str 

  str name=Full Dump Started2010-11-22 17:20:42/str 

  str name=Indexing completed. Added/Updated: 0 documents. Deleted 0 
documents./str 

  str name=Committed2010-11-22 17:20:43/str 

  str name=Optimized2010-11-22 17:20:43/str 

  str name=Total Documents Processed0/str 

  str name=Time taken0:0:0.375/str 
  /lst

  str name=WARNINGThis response format is experimental. It is likely to 
change in the future./str 
  /response
 
Thanks,
 
Tri

data import scheduling

2010-11-11 Thread Tri Nguyen
Hi,

Has anyone gotten solr to schedule data imports at a certain time interval 
through configuring solr?

I tried setting interval=1, which is import every minute but I don't see it 
happening.

I'm trying to avoid cron jobs.

Thanks,

Tri

importing from java

2010-11-11 Thread Tri Nguyen
Hi,

I'm restricted to the following in regards to importing.

I have access to a list (Iterator) of Java objects I need to import into solr.

Can I import the java objects as part of solr's data import interface (whenever 
an http request to solr to do a dataimport, it'll call my java class to get 
objects)?  


Before I had direct read only access to the db and specified the column 
mappings 
and things were fine with the data import.  


But now I am restricted to using a .jar file that has an api to get the records 
in the database and I need to publish these records in the db.  I do see solrj 
and but solrj is seaparate from the solr webapp.

Can I write my own dataimporthandler?

Thanks,

Tri

Re: importing from java

2010-11-11 Thread Tri Nguyen
another question is, can I write my own DataImportHandler class?

thanks,

Tri





From: Tri Nguyen tringuye...@yahoo.com
To: solr user solr-user@lucene.apache.org
Sent: Thu, November 11, 2010 7:01:25 PM
Subject: importing from java

Hi,

I'm restricted to the following in regards to importing.

I have access to a list (Iterator) of Java objects I need to import into solr.

Can I import the java objects as part of solr's data import interface (whenever 
an http request to solr to do a dataimport, it'll call my java class to get 
objects)?  


Before I had direct read only access to the db and specified the column 
mappings 

and things were fine with the data import.  


But now I am restricted to using a .jar file that has an api to get the records 
in the database and I need to publish these records in the db.  I do see solrj 
and but solrj is seaparate from the solr webapp.

Can I write my own dataimporthandler?

Thanks,

Tri

Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen
i'm looking for another solution other than cron job.

can i configure solr to schedule imports?





From: Ranveer Kumar ranveer.s...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:13:03 PM
Subject: Re: scheduling imports and heartbeats

You should use cron for that..

On 10 Nov 2010 08:47, Tri Nguyen tringuye...@yahoo.com wrote:

Hi,

Can I configure solr to schedule imports at a specified time (say once a
day,
once an hour, etc)?

Also, does solr have some sort of heartbeat mechanism?

Thanks,

Tri


Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen
Thanks for the tip Ken.  I tried that but don't see the importing happening 
when 
I check up on the status.

Below is what's in my dataimport.properties.

#Wed Nov 10 11:36:28 PST 2010
metadataObject.last_index_time=2010-09-20 11\:12\:47
interval=1
port=8080
server=localhost
params=/select?qt\=/dataimportcommand\=full-importclean\=truecommit\=true
webapp=solr
id.last_index_time=2010-11-10 11\:36\:27
syncEnabled=1
last_index_time=2010-11-10 11\:36\:27



 




From: Ken Stanley doh...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, November 10, 2010 4:41:17 AM
Subject: Re: scheduling imports and heartbeats

On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen tringuye...@yahoo.com wrote:
 Hi,

 Can I configure solr to schedule imports at a specified time (say once a day,
 once an hour, etc)?

 Also, does solr have some sort of heartbeat mechanism?

 Thanks,

 Tri

Tri,

If you use the DataImportHandler (DIH), you can set up a
dataimport.properties file that can be configured to import on
intervals.

http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example

As for heartbeat, you can use the ping handler (default is
/admin/ping) to check the status of the servlet.

- Ken


scheduling imports and heartbeats

2010-11-09 Thread Tri Nguyen
Hi,
 
Can I configure solr to schedule imports at a specified time (say once a day, 
once an hour, etc)?
 
Also, does solr have some sort of heartbeat mechanism?
 
Thanks,
 
Tri

searching while importing

2010-10-13 Thread Tri Nguyen
Hi,
 
Can I perform searches against the index while it is being imported?
 
Does importing add 1 document at a time or will solr make a temporary index and 
switch to that index when indexing is done?
 
Thanks,
 
Tri

Re: searching while importing

2010-10-13 Thread Tri Nguyen
Hi,

As long as I can search on the current (older) index while importing, I'm 
good.  I've tested this and I can search the older index while data-importing 
the newer index.

So you can search the older index in your 5 hour wait?

Thanks,

Tri





From: Shawn Heisey s...@elyograg.org
To: solr-user@lucene.apache.org
Sent: Wed, October 13, 2010 3:38:48 PM
Subject: Re: searching while importing

If you are using the DataImportHandler, you will not be able to search new data 
until the full-import or delta-import is complete and the update is committed.  
When I do a full reindex, it takes about 5 hours, and until it is finished, I 
cannot search it.

I have not tried to issue a manual commit in the middle of an import to see 
whether that makes data inserted up to that point searchable, but I would not 
expect that to work.

If you need this kind of functionality, you may need to change your build 
system 
so that a full import clears the index manually and then does a series of 
delta-import batches.


On 10/13/2010 3:51 PM, Tri Nguyen wrote:
 Hi,
  Can I perform searches against the index while it is being imported?
  Does importing add 1 document at a time or will solr make a temporary index 
and
 switch to that index when indexing is done?
  Thanks,
  Tri