Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
There are other updates that happen on the server that do not fail, so the
answer to your question is yes.

On Wed, Oct 10, 2012 at 8:12 AM, Sami Siren ssi...@gmail.com wrote:

 On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson
 w.briggs.thomp...@gmail.com wrote:
  *Sami*
  The client IS
  instantiated only once and not for every request. I was curious if this
 was
  part of the problem. Do I need to re-instantiate the object for each
  request made?

 No, it is expensive if you instantiate the client every time.

 When the client seems to be hanging, can you still access the Solr
 instance normally and execute updates/searches from other clients?

 --
  Sami Siren



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
They are both SolrJ.

What is happening is I have a batch indexer application that does a full
re-index once per day. I also have an incremental indexer that takes
items off a queue when they are updated.

The problem only happens when both are running at the same time - they also
run from the same machine. I am going to dig into this today and see what I
find - I didn't get around to it yesterday.

Question: I don't seem to see a StreamingUpdateSolrServer object on the 4.0
beta. I did see the ConcurrentUpdateSolrServer - this seems like a similar
choice. Is this correct?

On Wed, Oct 10, 2012 at 9:43 AM, Sami Siren ssi...@gmail.com wrote:

 On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson
 w.briggs.thomp...@gmail.com wrote:
  There are other updates that happen on the server that do not fail, so
 the
  answer to your question is yes.

 The other updates are using solrj or something else?

 It would be helpful if you could prepare a simple java program that
 uses solrj to demonstrate the problem. Based on the available
 information it is really difficult try to guess what's happening.

 --
  Sami Siren



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
Thanks for the heads up. I just tested this and you are right. I am making
a call to addBeans and it succeeds without any issue even when the server
is down. That sucks.

A big part of this process is reliant on knowing exactly what has made it
into the index and what has not, so this a difficult problem to solve when
you can't catch exceptions. I was thinking I could execute a ping request
first to determine if the Solr server is still operational, but that
doesn't help if the updateRequestHandler fails.

On Wed, Oct 10, 2012 at 1:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/9/2012 3:02 PM, Briggs Thompson wrote:

 *Otis* - jstack is a great suggestion, thanks! The problem didn't happen

 this morning but next time it does I will certainly get the dump to see
 exactly where the app is swimming around. I haven't used
 StreamingUpdateSolrServer
 but I will see if that makes a difference. Are there any major drawbacks
 of
 going this route?


 One caveat -- when using the Streaming/Concurrent object, your application
 will not be notified when there is a problem indexing. I've been told there
 is a way to override a method in the object to allow trapping errors, but I
 have not seen sample code and haven't figured out how to do it.  I've filed
 an issue and a patch to fix this.  It's received some comments, but so far
 nobody has decided to commit it.

 https://issues.apache.org/**jira/browse/SOLR-3284https://issues.apache.org/jira/browse/SOLR-3284

 Thanks,
 Shawn




Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-09 Thread Briggs Thompson
Thanks all for your responses. For some reason the emails were getting
filtered out of my inbox.

*Otis* - jstack is a great suggestion, thanks! The problem didn't happen
this morning but next time it does I will certainly get the dump to see
exactly where the app is swimming around. I haven't used
StreamingUpdateSolrServer
but I will see if that makes a difference. Are there any major drawbacks of
going this route?

*Sami* - if you are referring to
config:maxConnections=200maxConnectionsPerHost=8,
it showed up up in the Solr logs, not the SolrJ logs. The client IS
instantiated only once and not for every request. I was curious if this was
part of the problem. Do I need to re-instantiate the object for each
request made? I figured there would be more overhead if I am re-creating
the connection several times when I never really need to shut it down, but
at this point the overhead would be minimal though so I will try that.

*Hoss* - The reason it seemed the client was creating the log was because
the indexer (solr *server*) was more or less dormant for several hours,
then I booted up my indexing *client* and the maxConnectionsPerHost tidbit
was spit out right away. I was looking for something in the solrconfig and
online but didn't find anything. I didn't look for very long so will check
it out again.

Some very good suggestions here. I appreciate everyones feedback. I will
follow up after some experimentation.

Thanks,
Briggs Thompson


On Tue, Oct 9, 2012 at 11:19 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I did some digging and experimentation and found something interesting.
 : When starting up the application, I see the following in Solr logs:
 : Creating new http client,
 config:maxConnections=200maxConnectionsPerHost=8
 ...
 : It seems as though the maxConnections and maxConnectionsPerHost are not
 : actually getting set. Anyone seen this problem or have an idea how to
 : resolve?

 To elaborate on sami's comment...

 If you are seeing this in the logs from your solr *server*, it is unlikey
 that it has anything to do with the settings you are making on your solr
 *client*  this is probably related to the http client created inside
 solr for communicating with other solr nodes (ie: replication, solr cloud
 distributed updates, solr cloud peersync, etc...).  Which is different
 from the properties you set on the http client in your solr client
 application.

 I believe there is a way to configure the defaults for the internal used
 http clients via solrconfig.xml, but off the top of my head i don't
 remember what that is.



 -Hoss



Re: SolrJ - IOException

2012-10-08 Thread Briggs Thompson
I have also just ran into this a few times over the weekend in a newly
deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it
is hosted via AWS.

I have a RabbitMQ consumer that reads updates from a queue and posts
updates to Solr via SolrJ. There is quite a bit of error handling around
the indexing request, and even if Solr is not live the consumer application
successfully logs the exception and attempts to move along in the queue.
There are two consumer applications running at once, and at times processes
400 requests per minute. The high volume times is not necessarily when this
problem occurs, though.

This exception is causing the entire application to hang - which is
surprising considering all SolrJ logic is wrapped with try/catches. Has
anyone found out more information regarding the possible keep alive bug?
Any insight is much appreciated.

Thanks,
Briggs Thompson


Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: I/O exception (java.net.SocketException) caught when processing
request: Broken pipe
Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
tryExecute
INFO: Retrying request
Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work
SEVERE: {id:4049703,datetime:2012-10-08 07:22:05}
IOException occured when talking to server at:
http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon
server
org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at:
http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon
server
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96)
at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79)
at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57)
at com..solr.SolrIndexService.Index(SolrIndexService.java:36)
at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47)
at com..rabbitmq.job.Runner.run(Runner.java:84)
at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10)
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306)
... 10 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity. The cause lists the
reason the original request failed.
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
... 13 more
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147)
at
org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154)
at
org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95)
at
org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178)
at
org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72)
at
org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206)
at org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224)
at
org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183)
at
org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
at
org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
at
org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
at
org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
at
org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity(AbstractClientConnAdapter.java:227)
at
org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:257)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute

Re: SolrJ - IOException

2012-10-08 Thread Briggs Thompson
Also note there were no exceptions in the actual Solr log, only on the
SolrJ side.

Thanks,
Briggs

On Mon, Oct 8, 2012 at 10:45 AM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 I have also just ran into this a few times over the weekend in a newly
 deployed system. We are running Solr 4.0 Beta (not using SolrCloud) and it
 is hosted via AWS.

 I have a RabbitMQ consumer that reads updates from a queue and posts
 updates to Solr via SolrJ. There is quite a bit of error handling around
 the indexing request, and even if Solr is not live the consumer application
 successfully logs the exception and attempts to move along in the queue.
 There are two consumer applications running at once, and at times processes
 400 requests per minute. The high volume times is not necessarily when this
 problem occurs, though.

 This exception is causing the entire application to hang - which is
 surprising considering all SolrJ logic is wrapped with try/catches. Has
 anyone found out more information regarding the possible keep alive bug?
 Any insight is much appreciated.

 Thanks,
 Briggs Thompson


 Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
 tryExecute
 INFO: I/O exception (java.net.SocketException) caught when processing
 request: Broken pipe
 Oct 8, 2012 7:25:48 AM org.apache.http.impl.client.DefaultRequestDirector
 tryExecute
 INFO: Retrying request
 Oct 8, 2012 7:25:48 AM com..rabbitmq.worker.SolrWriter work
 SEVERE: {id:4049703,datetime:2012-10-08 07:22:05}
 IOException occured when talking to server at: 
 http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon
 server
 org.apache.solr.client.solrj.SolrServerException: IOException occured when
 talking to server at: 
 http://ec2-50-18-73-42.us-west-1.compute.amazonaws.com:8983/solr/coupon
 server
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:362)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:69)
 at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:96)
 at org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:79)
 at com..solr.SolrIndexService.IndexCoupon(SolrIndexService.java:57)
 at com..solr.SolrIndexService.Index(SolrIndexService.java:36)
 at com..rabbitmq.worker.SolrWriter.work(SolrWriter.java:47)
 at com..rabbitmq.job.Runner.run(Runner.java:84)
 at com..rabbitmq.job.SolrConsumer.main(SolrConsumer.java:10)
 Caused by: org.apache.http.client.ClientProtocolException
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:306)
 ... 10 more
 Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
 retry request with a non-repeatable request entity. The cause lists the
 reason the original request failed.
 at
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:686)
 at
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:517)
 at
 org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
 ... 13 more
 Caused by: java.net.SocketException: Broken pipe
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at
 org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:147)
 at
 org.apache.http.impl.io.AbstractSessionOutputBuffer.flush(AbstractSessionOutputBuffer.java:154)
 at
 org.apache.http.impl.conn.LoggingSessionOutputBuffer.flush(LoggingSessionOutputBuffer.java:95)
 at
 org.apache.http.impl.io.ChunkedOutputStream.flush(ChunkedOutputStream.java:178)
 at
 org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:72)
 at
 org.apache.http.entity.mime.HttpMultipart.doWriteTo(HttpMultipart.java:206)
 at
 org.apache.http.entity.mime.HttpMultipart.writeTo(HttpMultipart.java:224)
 at
 org.apache.http.entity.mime.MultipartEntity.writeTo(MultipartEntity.java:183)
 at
 org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
 at
 org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
 at
 org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:122)
 at
 org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:271)
 at
 org.apache.http.impl.conn.AbstractClientConnAdapter.sendRequestEntity

4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
Hello all,

I am running 4.0 alpha and have encountered something I am unable to
explain. I am indexing content to a master server, and the data is
replicating to a slave. The odd part is that when searching through the UI,
no documents show up on master with a standard *:* query. All cache types
are set to zero. I know indexing is working because I am watching the logs
and I can see documents getting added, not to mention the data is written
to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a
commit issue.

The very strange part is that the slave is correctly replicating the data,
and it is searchable in the UI on the slave (but not master). I don't
understand how/why the data is visible on the slave and not visible on the
master. Does anyone have any thoughts on this or seen it before?

Thanks in advance!
Briggs


Re: 4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
That is the problem. I wasn't aware of that new feature in 4.0. Thanks for
the quick response Tomás.

-Briggs

On Wed, Aug 1, 2012 at 3:08 PM, Tomás Fernández Löbbe tomasflo...@gmail.com
 wrote:

 Could your autocommit in the master be using openSearcher=false? If you
 go to the Master admin, do you see that the searcher has all the segments
 that you see in the filesystem?



 On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com
  wrote:

  Hello all,
 
  I am running 4.0 alpha and have encountered something I am unable to
  explain. I am indexing content to a master server, and the data is
  replicating to a slave. The odd part is that when searching through the
 UI,
  no documents show up on master with a standard *:* query. All cache types
  are set to zero. I know indexing is working because I am watching the
 logs
  and I can see documents getting added, not to mention the data is written
  to the filesystem. I have autocommit set to 6 (1 minute) so it isn't
 a
  commit issue.
 
  The very strange part is that the slave is correctly replicating the
 data,
  and it is searchable in the UI on the slave (but not master). I don't
  understand how/why the data is visible on the slave and not visible on
 the
  master. Does anyone have any thoughts on this or seen it before?
 
  Thanks in advance!
  Briggs
 



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
This is unrelated for the most part, but the javabin update request handler
does not seem to be working properly when calling solrj
method*HttpSolrServer.deleteById(ListString ids)
*. A single Id gets deleted from the index as opposed to the full list. It
appears properly in the logs - shows delete of all Ids sent, although all
but one remain in the index.

I confirmed that the default update request handler deletes the list
properly, so this appears to be a problem with
the BinaryUpdateRequestHandler.

Not an issue for me, just spreading the word.

Thanks,
Briggs

On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com wrote:

 we really need to resolve that issue soon...

 On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:

  Yury,
 
  Thank you so much! That was it. Man, I spent a good long while trouble
  shooting this. Probably would have spent quite a bit more time. I
  appreciate your help!!
 
  -Briggs
 
  On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:
 
  On 7/18/2012 7:11 PM, Briggs Thompson wrote:
  I have realized this is not specific to SolrJ but to my instance of
  Solr. Using curl to delete by query is not working either.
 
  Can be this: https://issues.apache.org/jira/browse/SOLR-3432
 

 - Mark Miller
 lucidimagination.com














Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-19 Thread Briggs Thompson
Thanks Mark!

On Thu, Jul 19, 2012 at 4:07 PM, Mark Miller markrmil...@gmail.com wrote:

 https://issues.apache.org/jira/browse/SOLR-3649

 On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com wrote:

  This is unrelated for the most part, but the javabin update request
 handler
  does not seem to be working properly when calling solrj
  method*HttpSolrServer.deleteById(ListString ids)
  *. A single Id gets deleted from the index as opposed to the full list.
 It
  appears properly in the logs - shows delete of all Ids sent, although all
  but one remain in the index.
 
  I confirmed that the default update request handler deletes the list
  properly, so this appears to be a problem with
  the BinaryUpdateRequestHandler.
 
  Not an issue for me, just spreading the word.
 
  Thanks,
  Briggs
 
  On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller markrmil...@gmail.com
  wrote:
 
   we really need to resolve that issue soon...
  
   On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote:
  
Yury,
   
Thank you so much! That was it. Man, I spent a good long while
 trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!
   
-Briggs
   
On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com
 wrote:
   
On 7/18/2012 7:11 PM, Briggs Thompson wrote:
I have realized this is not specific to SolrJ but to my instance of
Solr. Using curl to delete by query is not working either.
   
Can be this: https://issues.apache.org/jira/browse/SOLR-3432
   
  
   - Mark Miller
   lucidimagination.com
  
  
  
  
  
  
  
  
  
  
  
  
 



 --
 - Mark

 http://www.lucidimagination.com



Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
I have realized this is not specific to SolrJ but to my instance of Solr.
Using curl to delete by query is not working either.

Running
curl http://localhost:8983/solr/coupon/update -H Content-Type: text/xml
--data-binary 'deletequery*:*/query/delete'

Yields this in the logs:
INFO: [coupon] webapp=/solr path=/update
params={stream.body=deletequery*:*/query/delete}
{deleteByQuery=*:*} 0 0

But the corpus of documents in the core do not change.

My solrconfig is pretty barebones at this point, but I attached it in case
anyone sees something strange. Anyone have any idea why documents aren't
getting deleted?

Thanks in advance,
Briggs Thompson

On Wed, Jul 18, 2012 at 12:54 PM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 Hello All,

 I am using 4.0 Alpha and running into an issue with indexing using
 HttpSolrServer (SolrJ).

 Relevant java code:
 HttpSolrServer solrServer = new HttpSolrServer(MY_SERVER);
 solrServer.setRequestWriter(new BinaryRequestWriter());

 Relevant Solrconfig.xml content:

   requestHandler name=/update class=solr.UpdateRequestHandler  /

   requestHandler name=/update/javabin
 class=solr.BinaryUpdateRequestHandler /

 Indexing documents works perfectly fine (using addBeans()), however, when
 trying to do deletes I am seeing issues. I tried to do
 a solrServer.deleteByQuery(*:*) followed by a commit and optimize, and
 nothing is deleted.

 The response from delete request is a success, and even in the solr logs
 I see the following:

 INFO: [coupon] webapp=/solr path=/update/javabin
 params={wt=javabinversion=2} {deleteByQuery=*:*} 0 1
 Jul 18, 2012 11:15:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start
 commit{flags=0,version=0,optimize=true,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}



 I tried removing the binaryRequestWriter and have the request send out in
 default format, and I get the following error.

 SEVERE: org.apache.solr.common.SolrException: Unsupported ContentType:
 application/octet-stream  Not in: [application/xml, text/csv, text/json,
 application/csv, application/javabin, text/xml, application/json]

 at
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86)
 at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
  at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
  at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
  at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
  at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
  at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
  at java.lang.Thread.run(Thread.java:636)


 I thought that an optimize does the same thing as expungeDeletes, but in
 the log I see expungeDeletes=false. Is there a way to force that using
 SolrJ?

 Thanks in advance,
 Briggs


?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--
 This is a stripped down

Re: Solr 4 Alpha SolrJ Indexing Issue

2012-07-18 Thread Briggs Thompson
Yury,

Thank you so much! That was it. Man, I spent a good long while trouble
shooting this. Probably would have spent quite a bit more time. I
appreciate your help!!

-Briggs

On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats yuryk...@yahoo.com wrote:

 On 7/18/2012 7:11 PM, Briggs Thompson wrote:
  I have realized this is not specific to SolrJ but to my instance of
 Solr. Using curl to delete by query is not working either.

 Can be this: https://issues.apache.org/jira/browse/SOLR-3432



Re: Trunk error in Tomcat

2012-07-03 Thread Briggs Thompson
Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue
please let me know. Thanks!

-Briggs

On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 Interestingly, I just logged the issue of it not showing the right error
 in the UI here: https://issues.apache.org/jira/browse/SOLR-3591

 As for your specific issue, not sure, but the error should at least also
 show in the admin view.

 Erik


 On Jul 2, 2012, at 18:59 , Briggs Thompson wrote:

  Hi All,
 
  I just grabbed the latest version of trunk and am having a hard time
  getting it running properly in tomcat. It does work fine in Jetty. The
  admin screen gives the following error:
  This interface requires that you activate the admin request handlers, add
  the following configuration to your  Solrconfig.xml
 
  I am pretty certain the front end error has nothing to do with the actual
  error. I have seen some other folks on the distro with the same problem,
  but none of the threads have a solution (that I could find). Below is the
  stack trace. I also tried with different versions of Lucene but none
  worked. Note: my index is EMPTY and I am not migrating over an index
 build
  with a previous version of lucene. I think I ran into this a while ago
 with
  an earlier version of trunk, but I don't recall doing anything to fix it.
  Anyhow, if anyone has an idea with this one, please let me know.
 
  Thanks!
  Briggs Thompson
 
  SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50
  at
 
 org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
  at
 
 org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93)
  at
 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
  at
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
  at
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649)
  at
 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
  at
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899)
  at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875)
  at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
  at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963)
  at
 
 org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:680)




Re: Trunk error in Tomcat

2012-07-03 Thread Briggs Thompson
Also, I forgot to include this before, but there is a client side error
which is a failed 404 request to the below URL.

http://localhost:8983/solr/null/admin/system?wt=json

On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson w.briggs.thomp...@gmail.com
 wrote:

 Thanks Erik. If anyone else has any ideas about the NoSuchFieldError issue
 please let me know. Thanks!

 -Briggs


 On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Interestingly, I just logged the issue of it not showing the right error
 in the UI here: https://issues.apache.org/jira/browse/SOLR-3591

 As for your specific issue, not sure, but the error should at least also
 show in the admin view.

 Erik


 On Jul 2, 2012, at 18:59 , Briggs Thompson wrote:

  Hi All,
 
  I just grabbed the latest version of trunk and am having a hard time
  getting it running properly in tomcat. It does work fine in Jetty. The
  admin screen gives the following error:
  This interface requires that you activate the admin request handlers,
 add
  the following configuration to your  Solrconfig.xml
 
  I am pretty certain the front end error has nothing to do with the
 actual
  error. I have seen some other folks on the distro with the same problem,
  but none of the threads have a solution (that I could find). Below is
 the
  stack trace. I also tried with different versions of Lucene but none
  worked. Note: my index is EMPTY and I am not migrating over an index
 build
  with a previous version of lucene. I think I ran into this a while ago
 with
  an earlier version of trunk, but I don't recall doing anything to fix
 it.
  Anyhow, if anyone has an idea with this one, please let me know.
 
  Thanks!
  Briggs Thompson
 
  SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50
  at
 
 org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83)
  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
  at
 
 org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99)
  at
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131)
  at
 
 org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93)
  at
 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584)
  at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112)
  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510)
  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
  at
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
  at
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
  at
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649)
  at
 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305)
  at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
  at
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899)
  at
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875)
  at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
  at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963)
  at
 
 org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600)
  at
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:680)





Re: Trunk error in Tomcat

2012-07-03 Thread Briggs Thompson
Wow! I didn't know 4.0 alpha was released today. I think I will just get
that going. Woo!!

On Tue, Jul 3, 2012 at 9:00 AM, Vadim Kisselmann v.kisselm...@gmail.comwrote:

 same problem here:


 https://mail.google.com/mail/u/0/?ui=2view=btopver=18zqbez0n5t35q=tomcat%20v.kisselmannqs=truesearch=queryth=13615cfb9a5064bdqt=kisselmann.1.tomcat.1.tomcat's.1.v.1cvid=3



 https://issues.apache.org/jira/browse/SOLR-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230056#comment-13230056

 i use an older solr-trunk version from february/march, it works. with
 newer versions from trunk i get the same error: This interface
 requires that you activate the admin request handlers...

 regards
 vadim



 2012/7/3 Briggs Thompson w.briggs.thomp...@gmail.com:
  Also, I forgot to include this before, but there is a client side error
  which is a failed 404 request to the below URL.
 
  http://localhost:8983/solr/null/admin/system?wt=json
 
  On Tue, Jul 3, 2012 at 8:45 AM, Briggs Thompson 
 w.briggs.thomp...@gmail.com
  wrote:
 
  Thanks Erik. If anyone else has any ideas about the NoSuchFieldError
 issue
  please let me know. Thanks!
 
  -Briggs
 
 
  On Mon, Jul 2, 2012 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:
 
  Interestingly, I just logged the issue of it not showing the right
 error
  in the UI here: https://issues.apache.org/jira/browse/SOLR-3591
 
  As for your specific issue, not sure, but the error should at least
 also
  show in the admin view.
 
  Erik
 
 
  On Jul 2, 2012, at 18:59 , Briggs Thompson wrote:
 
   Hi All,
  
   I just grabbed the latest version of trunk and am having a hard time
   getting it running properly in tomcat. It does work fine in Jetty.
 The
   admin screen gives the following error:
   This interface requires that you activate the admin request handlers,
  add
   the following configuration to your  Solrconfig.xml
  
   I am pretty certain the front end error has nothing to do with the
  actual
   error. I have seen some other folks on the distro with the same
 problem,
   but none of the threads have a solution (that I could find). Below is
  the
   stack trace. I also tried with different versions of Lucene but none
   worked. Note: my index is EMPTY and I am not migrating over an index
  build
   with a previous version of lucene. I think I ran into this a while
 ago
  with
   an earlier version of trunk, but I don't recall doing anything to fix
  it.
   Anyhow, if anyone has an idea with this one, please let me know.
  
   Thanks!
   Briggs Thompson
  
   SEVERE: null:java.lang.NoSuchFieldError: LUCENE_50
   at
  
 
 org.apache.solr.analysis.SynonymFilterFactory$1.createComponents(SynonymFilterFactory.java:83)
   at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:83)
   at
  
 
 org.apache.lucene.analysis.synonym.SynonymMap$Builder.analyze(SynonymMap.java:120)
   at
  
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:99)
   at
  
 
 org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)
   at
  
 
 org.apache.solr.analysis.SynonymFilterFactory.loadSolrSynonyms(SynonymFilterFactory.java:131)
   at
  
 
 org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:93)
   at
  
 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:584)
   at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:112)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:510)
   at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333)
   at
  
 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:282)
   at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
   at
  
 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103)
   at
  
 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4649)
   at
  
 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5305)
   at
 org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at
  
 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:899)
   at
  org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:875)
   at
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
   at
 org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:963)
   at
  
 
 org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1600

DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
Hello Solr Community!

I am implementing a data connection to Solr through the Data Import Handler
and non-multivalued fields are working correctly, but multivalued fields
are not getting indexed properly.

I am new to DataImportHandler, but from what I could find, the entity is
the way to go for multivalued field. The weird thing is that data is being
indexed for one row, meaning first raw_tag gets populated.


Anyone have any ideas?
Thanks,
Briggs

This is the relevant part of the schema:

   field name =raw_tag type=text_en_lessAggressive indexed=true
stored=false multivalued=true/
   field name =raw_tag_string type=string indexed=false
stored=true multivalued=true/
   copyField source=raw_tag dest=raw_tag_string/

And the relevant part of data-import.xml:

document name=merchant
entity name=site
  query=select * from site 
field column=siteId name=siteId /
field column=domain name=domain /
field column=aliasFor name=aliasFor /
field column=title name=title /
field column=description name=description /
field column=requests name=requests /
field column=requiresModeration name=requiresModeration /
field column=blocked name=blocked /
field column=affiliateLink name=affiliateLink /
field column=affiliateTracker name=affiliateTracker /
field column=affiliateNetwork name=affiliateNetwork /
field column=cjMerchantId name=cjMerchantId /
field column=thumbNail name=thumbNail /
field column=updateRankings name=updateRankings /
field column=couponCount name=couponCount /
field column=category name=category /
field column=adult name=adult /
field column=rank name=rank /
field column=redirectsTo name=redirectsTo /
field column=wwwRequired name=wwwRequired /
field column=avgSavings name=avgSavings /
field column=products name=products /
field column=nameChecked name=nameChecked /
field column=tempFlag name=tempFlag /
field column=created name=created /
field column=enableSplitTesting name=enableSplitTesting /
field column=affiliateLinklock name=affiliateLinklock /
field column=hasMobileSite name=hasMobileSite /
field column=blockSite name=blockSite /
entity name=merchant_tags pk=siteId
query=select raw_tag, freetags.id,
freetagged_objects.object_id as siteId
   from freetags
   inner join freetagged_objects
   on freetags.id=freetagged_objects.tag_id
   where freetagged_objects.object_id='${site.siteId}'
field column=raw_tag name=raw_tag/
/entity
/entity
/document


Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
In addition, I tried a query like below and changed the column definition
to
field column=raw_tag name=raw_tag splitBy=, /
and still no luck. It is indexing the full content now but not multivalued.
It seems like the splitBy ins't working properly.

select group_concat(freetags.raw_tag separator ', ') as raw_tag, site.*
from site
left outer join
  (freetags inner join freetagged_objects)
 on (freetags.id = freetagged_objects.tag_id
   and site.siteId = freetagged_objects.object_id)
group  by site.siteId

Am I doing something wrong?
Thanks,
Briggs Thompson

On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
w.briggs.thomp...@gmail.com wrote:

 Hello Solr Community!

 I am implementing a data connection to Solr through the Data Import
 Handler and non-multivalued fields are working correctly, but multivalued
 fields are not getting indexed properly.

 I am new to DataImportHandler, but from what I could find, the entity is
 the way to go for multivalued field. The weird thing is that data is being
 indexed for one row, meaning first raw_tag gets populated.


 Anyone have any ideas?
 Thanks,
 Briggs

 This is the relevant part of the schema:

field name =raw_tag type=text_en_lessAggressive indexed=true
 stored=false multivalued=true/
field name =raw_tag_string type=string indexed=false
 stored=true multivalued=true/
copyField source=raw_tag dest=raw_tag_string/

 And the relevant part of data-import.xml:

 document name=merchant
 entity name=site
   query=select * from site 
 field column=siteId name=siteId /
 field column=domain name=domain /
 field column=aliasFor name=aliasFor /
 field column=title name=title /
 field column=description name=description /
 field column=requests name=requests /
 field column=requiresModeration name=requiresModeration /
 field column=blocked name=blocked /
 field column=affiliateLink name=affiliateLink /
 field column=affiliateTracker name=affiliateTracker /
 field column=affiliateNetwork name=affiliateNetwork /
 field column=cjMerchantId name=cjMerchantId /
 field column=thumbNail name=thumbNail /
 field column=updateRankings name=updateRankings /
 field column=couponCount name=couponCount /
 field column=category name=category /
 field column=adult name=adult /
 field column=rank name=rank /
 field column=redirectsTo name=redirectsTo /
 field column=wwwRequired name=wwwRequired /
 field column=avgSavings name=avgSavings /
 field column=products name=products /
 field column=nameChecked name=nameChecked /
 field column=tempFlag name=tempFlag /
 field column=created name=created /
 field column=enableSplitTesting name=enableSplitTesting /
 field column=affiliateLinklock name=affiliateLinklock /
 field column=hasMobileSite name=hasMobileSite /
 field column=blockSite name=blockSite /
 entity name=merchant_tags pk=siteId
 query=select raw_tag, freetags.id,
 freetagged_objects.object_id as siteId
from freetags
inner join freetagged_objects
on freetags.id=freetagged_objects.tag_id
 where freetagged_objects.object_id='${site.siteId}'
 field column=raw_tag name=raw_tag/
  /entity
 /entity
 /document



Re: DataImportHandler w/ multivalued fields

2011-12-01 Thread Briggs Thompson
Hey Rahul,

Thanks for the response. I actually just figured it thankfully :). To
answer your question, the raw_tag is indexed and not stored (tokenized),
and then there is a copyField for raw_tag to raw_tag_string which would
be used for facets. That *should have* been displayed in the results.

The silly mistake I made was not camel casing multiValued, which is
clearly the source of the problem.

The second email I sent changing the query and using the split for the
multivalued field had an error in it in the form of a missing line:
transformer=RegexTransformer
in the entity declaration.

Anyhow, thanks for the quick response!

Briggs


On Thu, Dec 1, 2011 at 12:57 PM, Rahul Warawdekar 
rahul.warawde...@gmail.com wrote:

 Hi Briggs,

 By saying multivalued fields are not getting indexed prperly, do you mean
 to say that you are not able to search on those fields ?
 Have you tried actually searching your Solr index for those multivalued
 terms and make sure if it returns the search results ?

 One possibility could be that the multivalued fields are getting indexed
 correctly and are searchable.
 However, since your schema.xml has a raw_tag field whose stored
 attribute is set to false, you may not be able to see those fields.



 On Thu, Dec 1, 2011 at 1:43 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com
  wrote:

  In addition, I tried a query like below and changed the column definition
  to
 field column=raw_tag name=raw_tag splitBy=, /
  and still no luck. It is indexing the full content now but not
 multivalued.
  It seems like the splitBy ins't working properly.
 
 select group_concat(freetags.raw_tag separator ', ') as raw_tag,
 site.*
  from site
  left outer join
   (freetags inner join freetagged_objects)
  on (freetags.id = freetagged_objects.tag_id
and site.siteId = freetagged_objects.object_id)
  group  by site.siteId
 
  Am I doing something wrong?
  Thanks,
  Briggs Thompson
 
  On Thu, Dec 1, 2011 at 11:46 AM, Briggs Thompson 
  w.briggs.thomp...@gmail.com wrote:
 
   Hello Solr Community!
  
   I am implementing a data connection to Solr through the Data Import
   Handler and non-multivalued fields are working correctly, but
 multivalued
   fields are not getting indexed properly.
  
   I am new to DataImportHandler, but from what I could find, the entity
 is
   the way to go for multivalued field. The weird thing is that data is
  being
   indexed for one row, meaning first raw_tag gets populated.
  
  
   Anyone have any ideas?
   Thanks,
   Briggs
  
   This is the relevant part of the schema:
  
  field name =raw_tag type=text_en_lessAggressive indexed=true
   stored=false multivalued=true/
  field name =raw_tag_string type=string indexed=false
   stored=true multivalued=true/
  copyField source=raw_tag dest=raw_tag_string/
  
   And the relevant part of data-import.xml:
  
   document name=merchant
   entity name=site
 query=select * from site 
   field column=siteId name=siteId /
   field column=domain name=domain /
   field column=aliasFor name=aliasFor /
   field column=title name=title /
   field column=description name=description /
   field column=requests name=requests /
   field column=requiresModeration
 name=requiresModeration
  /
   field column=blocked name=blocked /
   field column=affiliateLink name=affiliateLink /
   field column=affiliateTracker name=affiliateTracker /
   field column=affiliateNetwork name=affiliateNetwork /
   field column=cjMerchantId name=cjMerchantId /
   field column=thumbNail name=thumbNail /
   field column=updateRankings name=updateRankings /
   field column=couponCount name=couponCount /
   field column=category name=category /
   field column=adult name=adult /
   field column=rank name=rank /
   field column=redirectsTo name=redirectsTo /
   field column=wwwRequired name=wwwRequired /
   field column=avgSavings name=avgSavings /
   field column=products name=products /
   field column=nameChecked name=nameChecked /
   field column=tempFlag name=tempFlag /
   field column=created name=created /
   field column=enableSplitTesting
 name=enableSplitTesting
  /
   field column=affiliateLinklock name=affiliateLinklock
 /
   field column=hasMobileSite name=hasMobileSite /
   field column=blockSite name=blockSite /
   entity name=merchant_tags pk=siteId
   query=select raw_tag, freetags.id,
   freetagged_objects.object_id as siteId
  from freetags
  inner join freetagged_objects
  on freetags.id=freetagged_objects.tag_id
   where freetagged_objects.object_id='${site.siteId}'
   field

Re: difference between shard and core in solr

2011-07-18 Thread Briggs Thompson
I think everything you said is correct for static schemas, but a single core
does not necessarily have a unique schema since you can have dynamic
fields.

With dynamic fields, you can have multiple types of documents in the same
index (core), and a multiple types of indexed fields specific to individual
document types - all in the same core.

Briggs Thompson



On Mon, Jul 18, 2011 at 2:22 AM, pravesh suyalprav...@yahoo.com wrote:

 a single core is an index with same schema  , is this wat core really is ?

  YES. A single core is a independent index with its own unique schema. You
 go with a new core for cases where your schema/analysis/search requirements
 are completely different from your existing core(s).

 can a single core contain two separate indexes with different schema in it
 ?

 NO (for same reason as explained above).

 Is a shard  refers to a collection of index in a single physical machine
 ?can a single core be presented in different shards ?

 You can think of a Shard as a big index distributed across a cluster of
 machines. So all shards belonging to a single core share same
 schema/analysis/search requirements. You go with sharding when index is not
 scalable on a single machine, or, when your index grows really big in size.


 Thanx
 Pravesh

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/difference-between-shard-and-core-in-solr-tp3178214p3178249.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need help with troublesome wildcard query

2011-07-08 Thread Briggs Thompson
Hey Chris,
Removing the ORs in each query might help narrow down the problem, but I
suggest you run this through the query analyzer in order to see where it is
dropping out. It is a great tool for troubleshooting issues like these.

I see a few things here.

   - for leading wildcard queries, you should include the
   reverseWildcardFilterFactory. Check out the documentation here:
   
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory
   - Your result might get dropped out because you are trying to do wildcard
   searches on a stemmed field. Wildcard searches on a stemmed field is
   counter-intuitive because if you index computers, it may stem to comput,
   in which wildcard query of computer* would not match.
  - If you want to support stemming and wildcard searches, I suggest
  creating a copy field with an un-stemmed field type definition.

Don't forget if you modify your field type definition, you need to
re-index.

In response to your question about text_ws, this is just a different field
type definition that essentially splits on whiteSpaces. You should use that
if that is what the desired search logic is, but it probably isn't. Check
out the documentation on each of the tokenizers and filter factories in your
text field type and see what you need and what you don't to satisfy your
use cases.

Hope that helps,
Briggs Thompson


On Fri, Jul 8, 2011 at 9:03 AM, Christopher Cato 
christopher.c...@minimedia.se wrote:

 Hi Briggs. Thanks for taking the time. I have the query nearly working now,
 currently this is how it looks when it matches on the title Super
 Technocrane 30 and others with similar names:

 INFO: [] webapp=/solr path=/select/
 params={qf=title^40.0hl.fl=titlewt=jsonrows=10fl=*,scorestart=0q=(title:*super*+AND+*technocran*)+OR+(title:*super*+AND+*technocran)qt=standardfq=type:product+AND+language:sv}
 hits=3 status=0 QTime=1

 Adding another letter stops it matching:

 INFO: [] webapp=/solr path=/select/
 params={qf=title^40.0hl.fl=titlewt=jsonrows=10fl=*,scorestart=0q=(title:*super*+AND+*technocrane*)+OR+(title:*super*+AND+*technocrane)qt=standardfq=type:product+AND+language:sv}
 hits=0 status=0 QTime=0

 The field type definitions are as follows:

 field name=title type=text indexed=true stored=true
 termVectors=true omitNorms=true/

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=0
catenateNumbers=0
catenateAll=0
splitOnCaseChange=1
preserveOriginal=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


 There is also a type definition that is called text_ws, should I use that
 instead and change text to text_ws in the field definition for title?

!-- A text field that only splits on whitespace for exact matching of
 words --
fieldType name=text_ws class=solr.TextField
 positionIncrementGap=100
  analyzer

Re: Need help with troublesome wildcard query

2011-07-07 Thread Briggs Thompson
Hello Christopher,

Can you provide the exact query sent to Solr for the one word query and also
the two word query? The field type definition for your title field would be
useful too.

From what I understand, Solr should be able to handle your use case. I am
guessing it is a problem with how the field is defined assuming the query is
correct.

Briggs Thompson

On Thu, Jul 7, 2011 at 12:22 PM, Christopher Cato 
christopher.c...@minimedia.se wrote:

 Hi, I'm running Solr 3.2 with edismax under Tomcat 6 via Drupal.

 I'm having some problems writing a query that matches a specific field on
 several words. I have implemented an AJAX search that basically takes
 whatever is in a form field and attempts to match documents. I'm not having
 much luck though. First word always matches correctly but as soon as I enter
 the second word I'm loosing matches, the third word doesn't give any matches
 at all.

 The title field that I'm searching contains a product name that may or may
 not have several words.

 The requirement is that the search should be progressive i.e. as the user
 inputs words I should always return results that contain all of the words
 entered. I also have to correct bad input like an erraneous space in the
 product name ex. product name instead of productname.

 I'm wondering if there isn't an easier way to query Solr? Ideally I'd want
 to say give me all docs that have the following text in it's titles Is
 that possible?


 I'd really appreciate any help!


 Regards,
 Christopher Cato


Hit Rate

2011-07-05 Thread Briggs Thompson
Hello all,

Is there a good way to get the hit count of a search?

Example query:
textField:solr AND documentId:1000

Say document with Id = 1000 has solr 13 times in the document. Any way to
extract that number [13] in the response? I know we can return the score
which is loosely related to hit counts via tf-idf, but for this case I need
the actually hit counts. I believe you can get this information from the
logs, but that is less useful if the use case is on the presentation layer.

I tried faceting on the query but it seems like that returns the number of
documents that query matches rather than the hit count.
http://localhost:8080/solr/ExampleCore/select/?q=textField%3Asolr+AND+documentId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=textField:solrfacet.query=http://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook
textField:solrhttp://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook

I was thinking that highlighting essentially returns the hit count if you
supply unlimited amount of snippets, but I imagine there must be a more
elegant solution.

Thanks in advance,
Briggs


Re: Hit Rate

2011-07-05 Thread Briggs Thompson
Yes indeed, that is what I was missing. Thanks Ahmet!

On Tue, Jul 5, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote:


  Is there a good way to get the hit count of a search?
 
  Example query:
  textField:solr AND documentId:1000
 
  Say document with Id = 1000 has solr 13 times in the
  document. Any way to
  extract that number [13] in the response?

 Looks like you are looking for term frequency info:

 Two separate solutions:
 http://wiki.apache.org/solr/TermVectorComponent
 http://wiki.apache.org/solr/FunctionQuery#tf





Dynamic Fields vs. Multicore

2011-06-28 Thread Briggs Thompson
Hi All,

I was searching around for documentation of the performance differences of
having a sharded, single schema, dynamic field set up vs. a multi-core,
static multi-schema setup (which I currently have), but I have not had much
luck finding what I am looking for. I understand commits and optimizes will
be more intensive in a single core since there is more data (though I would
offset by sharding heavily), but I am particularly curious about the search
performance implications.

I am interested in moving to the dynamic field setup in order to implement a
better global search, but I want to make sure I understood the drawbacks of
hitting those datasets individually and globally after they are merged
(NOTE: I would have a global field signifying the dataset type, which could
then be added to the filter query in order to create the subset for
individual dataset queries).

Some background about the data: it is extremely variable. Some documents
contain only 2 or 3 sentences, and some are 20 page extracted PDFs. There
would probably only be about 100-150 unique fields.

Any input is greatly appreciated!

Thanks,
Briggs Thompson