AW: Sharing and performance testing question.

2012-08-29 Thread Markus Klose
Hi tiernan

Check out if solrmeter fits your need (http://code.google.com/p/solrmeter/)



Viele Grüße aus Augsburg

Markus Klose
SHI Elektronische Medien GmbH 


-Ursprüngliche Nachricht-
Von: Tiernan OToole [mailto:lsmart...@gmail.com] 
Gesendet: Dienstag, 28. August 2012 16:52
An: solr-user@lucene.apache.org
Betreff: Sharing and performance testing question.

Good morning all.

We are working on a project where we will have somewhere north of 10 solr 
instances running in different data centers around the world. Each instance 
will have the same schemes, but different data.

We have a test system up and running, mostly virtual boxes and amazon 
instances, and see have about 40 million records in the system. Anyway,  we now 
have the problem of performance testing the system, and want to make it as real 
world as possible.

To start, we have a list of query types, single words, multiple words, anded  
and ored words, some fuzzy tests and the like. We have the original data we 
generated the indexes from, now all we need to do is figure the best way to 
load test...

Any tips on load testing solr? Ideally we would like caching to not effect the 
result as much as possible.

Thanks in advance.

--Tiernan


Solr Shard Replicas sharing files

2012-08-29 Thread Christian von Wendt-Jensen
Hi,

I was wondering if it was possible to let all replicas of a shard share the 
physical lucene files. In that way you would only need one set of files on a 
shared storage, and then setup as many replicas as needed without copying files 
around. This would make it very fast to optimize and rebalance hardware 
resources as more shards are added.

What I visioning was a setup with one master doing all the indexing. Then all 
the shard replicas are installed as a string of slaves setup as both master and 
slave, such that the first replica replicates directly from the master. The 
next replica replicates from the first replica and so on.

In this way only the first replica need to write indexfiles. When the next 
replica is triggered to replicate it will find that all files are up to date, 
and then you issue a "commit" to reload the index in memory, thereby being 
up-to-date. The master's commit triggers a cascade of replication, which are 
all up-to-date immediately, and then it is a matter of few seconds for the 
slaves to be in sync with the master.

Taking this though further, the first replica could actually access the 
master's index files directly, and then be up-to-date without copying any files.

Would this setup be possible?



Med venlig hilsen / Best Regards

Christian von Wendt-Jensen
IT Team Lead, Customer Solutions

Infopaq International A/S
Kgs. Nytorv 22
DK-1050 København K

Phone +45 36 99 00 00
Mobile +45 31 17 10 07
Email  
christian.sonne.jen...@infopaq.com
Webwww.infopaq.com



Re: Sharing and performance testing question.

2012-08-29 Thread Alexey Serba
> Any tips on load testing solr? Ideally we would like caching to not effect
> the result as much as possible.

1. Siege tool
This is probably the simplest option. You can generate urls.txt file
and pass it to the tool. You should also capture server performance
(CPU, memory, qps, etc) using tools like newrelic, zabbix, etc.

2. SolrMeter
http://code.google.com/p/solrmeter/

3. Solr benchmark module (not committed yet)
You to run complex benchmarks using different algorithms
* https://issues.apache.org/jira/browse/SOLR-2646
* 
http://searchhub.org/dev/2011/07/11/benchmarking-the-new-solr-near-realtime-improvements/


Re: Antwort: Re: refiltering search results

2012-08-29 Thread Ahmet Arslan


--- On Wed, 8/29/12, johannes.schwendin...@blum.com 
 wrote:

> From: johannes.schwendin...@blum.com 
> Subject: Antwort: Re: refiltering search results
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 29, 2012, 8:22 AM
> The main idea is to filter results as
> much as possible with solr an then 
> check this result again. 
> To do this I have to read some information from some fields
> of the 
> documents in the result. 
> At the moment I am trying to do this in the process method
> of a Search 
> Component. But I even dont know 
> how to get access to the search results or the index Fields
> of the 
> documents. 
> I have thought of ResponseBuilder.getResults() but after I
> have the 
> DocListandSet Object I get stuck. 


You can read information from some fields using DocListandSet with

org.apache.solr.util.SolrPluginUtils#docListToSolrDocumentList

method.


how to boost given word in a field in the query parameters

2012-08-29 Thread andy
Hi All,

I am a Solr newbie,I encountered a problem right now about how to* boost
given word in the field in the query parameter*
the details as follows:
my solr schema as follows




category field has certain values ,for examples:307,503,206..

my query like this:q=cell+phone&version=2.2&start=0&rows=10&indent=on
the search result will be in many categories ,for example may be in
206,782,307,289
you know the the default sort which depends on the relevance, *I want  the
result which in category 206 is in front of others*

does anybody know about it

Thanks 
Andy




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-given-word-in-a-field-in-the-query-parameters-tp4003961.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Injest pauses

2012-08-29 Thread Alexey Serba
Hey Brad,

> This leads me to believe that a single merge thread is blocking indexing from 
> occuring.
> When this happens our producers, which distribute their updates amongst all 
> the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread Ahmet Arslan
> Hi,
> 
> thanks, 
> 
> I tried by adding " marks,  but still giving same
> results.
> 
> http://localhost:8080/test/suggest/?q="michael f"

Looking back to your field type definition, i saw that you have defined 

   in query analyzer. Move this into index analyzer. Restart 
solr, re-index and suggest/?q="michael f" should return expected results.


Re: how to boost given word in a field in the query parameters

2012-08-29 Thread Ahmet Arslan
> category field has certain values ,for
> examples:307,503,206..
> 
> my query like
> this:q=cell+phone&version=2.2&start=0&rows=10&indent=on
> the search result will be in many categories ,for example
> may be in
> 206,782,307,289
> you know the the default sort which depends on the
> relevance, *I want  the
> result which in category 206 is in front of others*

One way to do this is : add a clause into your query.

q=cell phone category:206^100


Re: Custom close to index metadata / pass commit data to writer.commit

2012-08-29 Thread Jozef Vilcek
Hi,

I just wanted to check if someone have an idea about intentions with this issue:
https://issues.apache.org/jira/browse/SOLR-2701

It is marked for 4.0-Alpha and there is already Beta out there.
Can anyone tell if it planed to be part of 4.0 release.

Best,
Jozef

On Sun, Jun 24, 2012 at 1:18 AM, Erick Erickson  wrote:
> see: https://issues.apache.org/jira/browse/SOLR-2701.
>
> But there's an easier alternative. Just have a _very special_ document
> with a known that you index at the end of the run that
> 1> has no fields in common with any other document (except uniqueKey)
> 2> contains whatever data you want to carry around in whatever format you 
> want.
>
> Now whenever you query for that document by ID, you get your info. And
> since you can't search the doc until after it's been committed, you know
> that the preceding documents have all been persisted
>
> Of course whenever you send a version of the doc it will overwrite the
> one before since it has the same 
>
> Best
> Erick
>
> On Fri, Jun 22, 2012 at 5:34 AM, Jozef Vilcek  wrote:
>> Hi everyone,
>>
>> I am seeking to solution to store some custom data very close to /
>> within index. I have found a possibility to pass commit "user" data to
>> IndexWriter:
>> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#commit(java.util.Map)
>> which are from what I understand stored somewhere close to segments
>> "metadata" like index version, generation, ...
>>
>> Now, I see no easy way to accumulate and pass along such data with
>> Solr 3.6. DirectUpdateHandler2 is committing implicitly via close
>> rather than invoking commit API. I can extend DirectUpdateHander2 and
>> alter closeWriter method but still ... I am not yet clear how to pass
>> along request level params which are not available at
>> DirectUpdateHandler2 level. It seems that passing commitData is not
>> supported ( maybe not wanted to by by design ) and not going to be as
>> when I look at Solr trunk, I see implicit commit removed,
>> writer.commit with passing commitData used but no easy way how to pass
>> custom commit data nor how to easily hook in.
>>
>> Any recommendations for how to store some data close to index?
>>
>> To throw some light why I what this ... Basically I want to store
>> there some kind of time stamp, which defines what is already in the
>> index with respect to feeding updates from external world. Now, my
>> index is replicated to other index instance in different data center
>> (serving traffic as well). When default document feed in DC1 go south
>> for some reason, backup in DC2 bumps in to keep updates alive ... but
>> it has to know from where the feed should start ... that would be that
>> kind of time stamp stored and replicated with index.
>>
>> Many thanks in advance.
>>
>> Best,
>> Jozef


Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy
Hi iorixxx,

Thanks for your reply, if I insert the clause category:206^100 , the search
result will only include the results in category 206
?

 
iorixxx wrote
> 
>> category field has certain values ,for
>> examples:307,503,206..
>> 
>> my query like
>> this:q=cell+phone&version=2.2&start=0&rows=10&indent=on
>> the search result will be in many categories ,for example
>> may be in
>> 206,782,307,289
>> you know the the default sort which depends on the
>> relevance, *I want  the
>> result which in category 206 is in front of others*
> 
> One way to do this is : add a clause into your query.
> 
> q=cell phone category:206^100
> 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-given-word-in-a-field-in-the-query-parameters-tp4003961p4003981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Antwort: Re: Antwort: Re: refiltering search results

2012-08-29 Thread Johannes . Schwendinger
Von:
Ahmet Arslan 
An:
solr-user@lucene.apache.org
Datum:
29.08.2012 10:50
Betreff:
Re: Antwort: Re: refiltering search results


Thanks for the answer. 

My next question is how can i filter the result or how to replace the old 
ResponseBuilder Result with a new one?


--- On Wed, 8/29/12, johannes.schwendin...@blum.com 
 wrote:

> From: johannes.schwendin...@blum.com 
> Subject: Antwort: Re: refiltering search results
> To: solr-user@lucene.apache.org
> Date: Wednesday, August 29, 2012, 8:22 AM
> The main idea is to filter results as
> much as possible with solr an then 
> check this result again. 
> To do this I have to read some information from some fields
> of the 
> documents in the result. 
> At the moment I am trying to do this in the process method
> of a Search 
> Component. But I even dont know 
> how to get access to the search results or the index Fields
> of the 
> documents. 
> I have thought of ResponseBuilder.getResults() but after I
> have the 
> DocListandSet Object I get stuck. 


You can read information from some fields using DocListandSet with

org.apache.solr.util.SolrPluginUtils#docListToSolrDocumentList

method.



Re: how to boost given word in a field in the query parameters

2012-08-29 Thread Ahmet Arslan

> Thanks for your reply, if I insert the clause
> category:206^100 , the search
> result will only include the results in category 206
> ?

It will be an optional clause, unless you have set default operator to AND 
somewhere.

search results will contain all categories, but 206 will be boosted.




Multiple Versions getting formed while replicating in solr 1.4.1

2012-08-29 Thread mechravi25
Hi,


Im using solr 1.4.1 version and I have the following configuration for
replication in master and slave

Solrconfig.xml (master)


 
  commit
  startup
  schema.xml,stopwords.txt



 

SolrConfig.xml (slave)

  
  
http://localhost:8982/solr/corez/replication
  
  

The master is used to index the data and the search requests will be given
to the slave. Tha data can be also pushed into slave from the UI (on need
basis) 

At times, we find that the replicated folder in slave is of index.timestamp
format and when this occurs, some segements are not getting replicated to
slave. Due to this, the searches are not returning the proper results. 


When I searched regarding this issue I found various answers to it. Can you
tell me which of the below listed issue is the root cause?


1. Can it be due to the folloing issue raised in solr?
   https://issues.apache.org/jira/browse/SOLR-1781

2. Can this occur due to the slave having additional data (which is being
pushed from UI)?

3. Can this occur due to solr warming as described in the below forum post?
  
http://lucene.472066.n3.nabble.com/Solr-warming-when-using-master-slave-replication-td3293838.html

4. Can this be due to the segments.gen file not being copied properly as
given in the below post?
  
http://lucene.472066.n3.nabble.com/Lucene-FieldCache-memory-requirements-td484731i20.html

Can you please guide me to the root of this problem?

Thanks  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Versions-getting-formed-while-replicating-in-solr-1-4-1-tp4003986.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi,

thanks for ur reply,

I donot know how to remove multiple white spaces using regax in the search
text. Can u share me that one.

Thanks,

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4003991.html
Sent from the Solr - User mailing list archive at Nabble.com.


LateBinding

2012-08-29 Thread Johannes . Schwendinger
Hello,

Has anyone ever implementet the security feature called late-binding? 

I am trying this but I am very new to solr and I would be very glad if I 
would get some hints to this.

Regards,
Johannes

Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; 
-F "myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.



50018org.apache.tika.exception.TikaException
: Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser
@328c62ceorg.apache.solr.common.SolrException: 
org.apach
e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika
.parser.microsoft.OfficeParser@328c62ce
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:230)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:454)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:275)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
er.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
r.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
ection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
gHttpConnection.java:47)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
ctHttpConnection.java:890)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
Complete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
nnection.java:66)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
cketConnector.java:254)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
l.java:599)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
.java:534)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.tika.exception.TikaException: Unexpected 
RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:224)
... 31 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
at org.apache.poi.hwpf.model.Colorref.(Colorref.java:81)
at 
org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
tType.java:56)
at 
org.apache.poi.hwpf.usermodel.ShadingDescriptor.(ShadingD
escriptor.java:38)
at 
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
tion(

Re: LateBinding

2012-08-29 Thread Alexey Serba
http://searchhub.org/dev/2012/02/22/custom-security-filtering-in-solr/

See section about PostFilter.

On Wed, Aug 29, 2012 at 4:43 PM,   wrote:
> Hello,
>
> Has anyone ever implementet the security feature called late-binding?
>
> I am trying this but I am very new to solr and I would be very glad if I
> would get some hints to this.
>
> Regards,
> Johannes


Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Jack Krupansky

Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I 
wouldn't try to guarantee that.


In any case, this should be filed in Jira as a bug in Solr 4.0-BETA 
(SolrCell/Extraction component).


-- Jack Krupansky

-Original Message- 
From: Alexander Cougarman

Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:


 curl 
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F 
"myfile=@15.doc"


Here's the exception. And the same files go through Solr 3.6.1 just fine.

   
   
   500name="QTime">18   >name="msg">org.apache.tika.exception.TikaException
   : Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser
   @328c62ceorg.apache.solr.common.SolrException: 
org.apach
   e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika

   .parser.microsoft.OfficeParser@328c62ce
   at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr

   actingDocumentLoader.java:230)
   at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co

   ntentStreamHandlerBase.java:74)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl

   erBase.java:129)
   at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle

   Request(RequestHandlers.java:240)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter

   .java:454)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte

   r.java:275)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet

   Handler.java:1337)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java

   :484)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j

   ava:119)
   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl

   er.java:233)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl

   er.java:1065)
   at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:

   413)
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle

   r.java:192)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle

   r.java:999)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j

   ava:117)
   at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont

   extHandlerCollection.java:250)
   at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl

   ection.java:149)
   at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper

   .java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac

   tHttpConnection.java:454)
   at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin

   gHttpConnection.java:47)
   at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra

   ctHttpConnection.java:890)
   at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header

   Complete(AbstractHttpConnection.java:944)
   at 
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
   at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)


   at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo

   nnection.java:66)
   at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So

   cketConnector.java:254)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo

   l.java:599)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool

   .java:534)
   at java.lang.Thread.run(Unknown Source)
   Caused by: org.apache.tika.exception.TikaException: Unexpected 
RuntimeException

   from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
   at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244

   )
   at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242

   )
   at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:

RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
I believe these are the older Word 97 docs (*.doc) files. The problem was that 
Solr 3.6.1 blew up on *.MSG files when doing extractOnly=true. So we upgraded 
to Solr 4.0, and now run into this; if we use Tika 1.0, I'm afraid the DOC 
files will be fixed but the MSG files will break!

Sincerely,
Alex Cougarman

Bahá'í World Centre
Haifa, Israel
Office: +972-4-835-8683 
Cell: +972-54-241-4742
acoug...@bwc.org  


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 29 August 2012 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I 
wouldn't try to guarantee that.

In any case, this should be filed in Jira as a bug in Solr 4.0-BETA 
(SolrCell/Extraction component).

-- Jack Krupansky

-Original Message-
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:

  curl
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F 
"myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.



50018org.apache.tika.exception.TikaException
: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser
@328c62ceorg.apache.solr.common.SolrException: 
org.apach
e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika
.parser.microsoft.OfficeParser@328c62ce
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:230)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:454)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:275)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
er.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
r.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
ection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
tHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
gHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
ctHttpConnection.java:890)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
Complete(AbstractHttpConnection.java:944)
at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
nnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
cketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPo

Re: Sharing and performance testing question.

2012-08-29 Thread Tiernan OToole
Thanks for the tips! will check out those links and see what i can find!

On Wed, Aug 29, 2012 at 9:44 AM, Alexey Serba  wrote:

> > Any tips on load testing solr? Ideally we would like caching to not
> effect
> > the result as much as possible.
>
> 1. Siege tool
> This is probably the simplest option. You can generate urls.txt file
> and pass it to the tool. You should also capture server performance
> (CPU, memory, qps, etc) using tools like newrelic, zabbix, etc.
>
> 2. SolrMeter
> http://code.google.com/p/solrmeter/
>
> 3. Solr benchmark module (not committed yet)
> You to run complex benchmarks using different algorithms
> * https://issues.apache.org/jira/browse/SOLR-2646
> *
> http://searchhub.org/dev/2011/07/11/benchmarking-the-new-solr-near-realtime-improvements/
>



-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Very interesting links, after much more digging yesterday this appears to be 
exactly what I'm seeing.

I am using 4.0 beta currently for my testing.  FWIW I've also pulled trunk from 
svn  as of yesterday and experienced the same issue.

From: Alexey Serba [ase...@gmail.com]
Sent: Wednesday, August 29, 2012 6:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

Hey Brad,

> This leads me to believe that a single merge thread is blocking indexing from 
> occuring.
> When this happens our producers, which distribute their updates amongst all 
> the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


Re: Solr 4.0 - Join performance

2012-08-29 Thread David Smiley (@MITRE.org)
Solr 4 is certainly the goal.  There's a bit of a setback at the moment until 
some of the Lucene spatial API is re-thought.  I'm working heavily on such 
things this week.
~ David

On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:


David, Solr support for this will come in Solr-3304 I 
suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if 
this is going to make it into Solr 4.0? Thanks,Eric.
 > Date: Wed, 15 Aug 2012 07:07:21 -0700

> From: [hidden 
> email]
> To: [hidden email]
> Subject: RE: Solr 4.0 - Join performance
>
> You would index rectangles of 0 height but that have a left edge 'x' of the
> start time and a right edge 'x' of your end time.  You can index a variable
> number of these per Solr document and then query by either a point or
> another rectangle to find documents which intersect your query shape.  It
> can't do a completely within based query, just intersection for now.  I
> really look forward to seeing this wrapped up in some sort of RangeFieldType
> so that users don't have to think in spatial terms.
>
>
>
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> Sent from the Solr - User mailing list archive at 
> Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4003852.html
To unsubscribe from Solr 4.0 - Join performance, click 
here.
NAML





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Jack Krupansky
Understood. Well, you could always manually convert old docs to a newer doc 
format. Or use a tool such as:

http://download.cnet.com/Docx-to-Doc-Converter/3000-2079_4-75206386.html

-- Jack Krupansky

-Original Message- 
From: Alexander Cougarman

Sent: Wednesday, August 29, 2012 9:59 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

I believe these are the older Word 97 docs (*.doc) files. The problem was 
that Solr 3.6.1 blew up on *.MSG files when doing extractOnly=true. So we 
upgraded to Solr 4.0, and now run into this; if we use Tika 1.0, I'm afraid 
the DOC files will be fixed but the MSG files will break!


Sincerely,
Alex Cougarman

Bahá'í World Centre
Haifa, Israel
Office: +972-4-835-8683
Cell: +972-54-241-4742
acoug...@bwc.org


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: 29 August 2012 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I 
wouldn't try to guarantee that.


In any case, this should be filed in Jira as a bug in Solr 4.0-BETA 
(SolrCell/Extraction component).


-- Jack Krupansky

-Original Message-
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:


 curl
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F 
"myfile=@15.doc"


Here's the exception. And the same files go through Solr 3.6.1 just fine.

   
   
   500name="QTime">18
   >org.apache.tika.exception.TikaException
   : Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser
   @328c62ceorg.apache.solr.common.SolrException:
org.apach
   e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika

   .parser.microsoft.OfficeParser@328c62ce
   at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
   actingDocumentLoader.java:230)
   at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
   ntentStreamHandlerBase.java:74)
   at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
   erBase.java:129)
   at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
   Request(RequestHandlers.java:240)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
   .java:454)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
   r.java:275)
   at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
   Handler.java:1337)
   at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
   :484)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
   ava:119)
   at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
   er.java:233)
   at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
   er.java:1065)
   at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
   413)
   at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
   r.java:192)
   at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
   r.java:999)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
   ava:117)
   at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
   extHandlerCollection.java:250)
   at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
   ection.java:149)
   at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
   .java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
   tHttpConnection.java:454)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
   gHttpConnection.java:47)
   at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
   ctHttpConnection.java:890)
   at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
   Complete(AbstractHttpConnection.java:944)
   at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy


Thanks
Yes, my default operator is AND,if I use OR operator like this:
q=cell phone OR category:206^100 , the results will more than the query
q=cell phone  may be something in the category 206 which don't contains the
cell phone keywords will be included, This is really a  tickler for me



iorixxx wrote
> 
>> Thanks for your reply, if I insert the clause
>> category:206^100 , the search
>> result will only include the results in category 206
>> ?
> 
> It will be an optional clause, unless you have set default operator to AND
> somewhere.
> 
> search results will contain all categories, but 206 will be boosted.
> 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-given-word-in-a-field-in-the-query-parameters-tp4003961p4004057.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr 4.0 - Join performance

2012-08-29 Thread Eric Khoury

Awesome, thanks David.  In the meantime, could I potentially use geohash, or 
something similar?  Geohash looks like it supports seperate "lon" or "lat" 
range queries which would help, but its not a multivalue field, which I need.
 > Date: Wed, 29 Aug 2012 07:20:42 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> 
> Solr 4 is certainly the goal.  There's a bit of a setback at the moment until 
> some of the Lucene spatial API is re-thought.  I'm working heavily on such 
> things this week.
> ~ David
> 
> On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> 
> 
> David, Solr support for this will come in Solr-3304 I 
> suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if 
> this is going to make it into Solr 4.0? Thanks,Eric.
>  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> 
> > From: [hidden 
> > email]
> > To: [hidden 
> > email]
> > Subject: RE: Solr 4.0 - Join performance
> >
> > You would index rectangles of 0 height but that have a left edge 'x' of the
> > start time and a right edge 'x' of your end time.  You can index a variable
> > number of these per Solr document and then query by either a point or
> > another rectangle to find documents which intersect your query shape.  It
> > can't do a completely within based query, just intersection for now.  I
> > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > so that users don't have to think in spatial terms.
> >
> >
> >
> > -
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > Sent from the Solr - User mailing list archive at 
> > Nabble.com.
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4003852.html
> To unsubscribe from Solr 4.0 - Join performance, click 
> here.
> NAML
> 
> 
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

2012-08-29 Thread Alexander Cougarman
Thanks, Jack. Another solution: Use two instances of Solr on separate ports -- 
3.6.1 and 4.0. Use an IF statement to send the file to the proper instance :)

Sincerely,
Alex Cougarman

Bahá'í World Centre
Haifa, Israel
Office: +972-4-835-8683 
Cell: +972-54-241-4742
acoug...@bwc.org  


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 29 August 2012 5:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Understood. Well, you could always manually convert old docs to a newer doc 
format. Or use a tool such as:
http://download.cnet.com/Docx-to-Doc-Converter/3000-2079_4-75206386.html

-- Jack Krupansky

-Original Message-
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:59 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

I believe these are the older Word 97 docs (*.doc) files. The problem was that 
Solr 3.6.1 blew up on *.MSG files when doing extractOnly=true. So we upgraded 
to Solr 4.0, and now run into this; if we use Tika 1.0, I'm afraid the DOC 
files will be fixed but the MSG files will break!

Sincerely,
Alex Cougarman

Bahá'í World Centre
Haifa, Israel
Office: +972-4-835-8683
Cell: +972-54-241-4742
acoug...@bwc.org


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: 29 August 2012 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Sounds like this POI bug (SolrCell invokes Tika which invokes POI):
https://issues.apache.org/bugzilla/show_bug.cgi?id=53380

Are these in fact Office 97 documents that are failing?

Solr 4.0 includes Tika 1.1, while Solr 3.6.1 includes Tika 1.0.

It may be possible for you to drop the old Tika 1.0 into Solr 4.0, but I 
wouldn't try to guarantee that.

In any case, this should be filed in Jira as a bug in Solr 4.0-BETA 
(SolrCell/Extraction component).

-- Jack Krupansky

-Original Message-
From: Alexander Cougarman
Sent: Wednesday, August 29, 2012 9:05 AM
To: solr-user@lucene.apache.org
Subject: Unexcpected RuntimeException when indexing with Solr 4.0 Beta

Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:

  curl
"http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; -F 
"myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.



50018org.apache.tika.exception.TikaException
: Unexpected RuntimeException from
org.apache.tika.parser.microsoft.OfficeParser
@328c62ceorg.apache.solr.common.SolrException:
org.apach
e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika
.parser.microsoft.OfficeParser@328c62ce
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:230)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:454)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:275)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
er.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
er.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
r.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
r.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
ava:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
extHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
ection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
a

Re: how to boost given word in a field in the query parameters

2012-08-29 Thread andy
I got it , thanks for ur kindly reply!!!


iorixxx wrote
> 
>> Thanks for your reply, if I insert the clause
>> category:206^100 , the search
>> result will only include the results in category 206
>> ?
> 
> It will be an optional clause, unless you have set default operator to AND
> somewhere.
> 
> search results will contain all categories, but 206 will be boosted.
> 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-boost-given-word-in-a-field-in-the-query-parameters-tp4003961p4004061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 - Join performance

2012-08-29 Thread David Smiley (@MITRE.org)
The solr.GeoHashFieldType is useless; I'd like to see it deprecated then 
removed.  You'll need to go with unreleased code and apply patches or wait till 
Solr 4.

~ David

On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:


Awesome, thanks David.  In the meantime, could I potentially use geohash, or 
something similar?  Geohash looks like it supports seperate "lon" or "lat" 
range queries which would help, but its not a multivalue field, which I need.
 > Date: Wed, 29 Aug 2012 07:20:42 -0700

> From: [hidden 
> email]
> To: [hidden email]
> Subject: Re: Solr 4.0 - Join performance
>
> Solr 4 is certainly the goal.  There's a bit of a setback at the moment until 
> some of the Lucene spatial API is re-thought.  I'm working heavily on such 
> things this week.
> ~ David
>
> On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
>
>
> David, Solr support for this will come in Solr-3304 I 
> suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if 
> this is going to make it into Solr 4.0? Thanks,Eric.
>  > Date: Wed, 15 Aug 2012 07:07:21 -0700
>
> > From: [hidden 
> > email]
> > To: [hidden 
> > email]
> > Subject: RE: Solr 4.0 - Join performance
> >
> > You would index rectangles of 0 height but that have a left edge 'x' of the
> > start time and a right edge 'x' of your end time.  You can index a variable
> > number of these per Solr document and then query by either a point or
> > another rectangle to find documents which intersect your query shape.  It
> > can't do a completely within based query, just intersection for now.  I
> > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > so that users don't have to think in spatial terms.
> >
> >
> >
> > -
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > Sent from the Solr - User mailing list archive at 
> > Nabble.com>.
>
>
> 
> If you reply to this email, your message will be added to the discussion 
> below:
>
> NAML
>
>
>
>
>
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> Sent from the Solr - User mailing list archive at 
> Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
To unsubscribe from Solr 4.0 - Join performance, click 
here.
NAML





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
Sent from the Solr - User mailing list archive at Nabble.com.

Hierarchical faceting and filter query exclusions

2012-08-29 Thread Nicholas Swarr
We're using Solr 4.0 Beta, testing the hierarchical faceting support to see if 
it's a good fit to facet on taxonomies.  One issue we've encountered is that we 
can't apply filter exclusions to the hierarchical facets so as to preserve 
facet count with multi-select.  I haven't been able to locate or otherwise 
determine if there's documentation that would outline how this is done.  We've 
tried a few things with local params but it appears those aren't parsed with 
the facet.pivot argument.  I found this ticket related to that:

https://issues.apache.org/jira/browse/SOLR-2255

Could anyone offer some insight or guidance on this?

Thanks,
Nick
This e-mail message, and any attachments, is intended only for the use of the 
individual or entity identified in the alias address of this message and may 
contain information that is confidential, privileged and subject to legal 
restrictions and penalties regarding its unauthorized disclosure and use. Any 
unauthorized review, copying, disclosure, use or distribution is strictly 
prohibited. If you have received this e-mail message in error, please notify 
the sender immediately by reply e-mail and delete this message, and any 
attachments, from your system. Thank you.




RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Anyone know the actual status of SOLR-2565, it looks to be marked as resolved 
in 4.* but I am still seeing long pauses during commits using 4.*

I am currently digging through code to see what I can find, but java not being 
my primary (or secondary ) language it is mostly slow going.

-Original Message-
From: Voth, Brad (GE Corporate) 
Sent: Wednesday, August 29, 2012 10:17 AM
To: solr-user@lucene.apache.org
Subject: RE: Injest pauses

Very interesting links, after much more digging yesterday this appears to be 
exactly what I'm seeing.

I am using 4.0 beta currently for my testing.  FWIW I've also pulled trunk from 
svn  as of yesterday and experienced the same issue.

From: Alexey Serba [ase...@gmail.com]
Sent: Wednesday, August 29, 2012 6:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

Hey Brad,

> This leads me to believe that a single merge thread is blocking indexing from 
> occuring.
> When this happens our producers, which distribute their updates amongst all 
> the shards, pile up on this shard and wait.
Which version of Solr you are using? Have you tried 4.0 beta?

* 
http://searchhub.org/dev/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/
* https://issues.apache.org/jira/browse/SOLR-2565

Alexey


Re: Injest pauses

2012-08-29 Thread Yonik Seeley
On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate)
 wrote:
> Anyone know the actual status of SOLR-2565, it looks to be marked as resolved 
> in 4.* but I am still seeing long pauses during commits using 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by
commits (at least at the Solr level).

-Yonik
http://lucidworks.com


RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Thanks, I'll continue with my testing and tracking down the block.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, August 29, 2012 12:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate)  
wrote:
> Anyone know the actual status of SOLR-2565, it looks to be marked as 
> resolved in 4.* but I am still seeing long pauses during commits using 
> 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
least at the Solr level).

-Yonik
http://lucidworks.com


RE: Solr 4.0 - Join performance

2012-08-29 Thread Eric Khoury

Thanks David, will work around this issue for now, and will keep an eye out for 
changes to solr-3304.Good luck with the rethink.Eric.
 > Date: Wed, 29 Aug 2012 08:44:14 -0700
> From: dsmi...@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> 
> The solr.GeoHashFieldType is useless; I'd like to see it deprecated then 
> removed.  You'll need to go with unreleased code and apply patches or wait 
> till Solr 4.
> 
> ~ David
> 
> On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:
> 
> 
> Awesome, thanks David.  In the meantime, could I potentially use geohash, or 
> something similar?  Geohash looks like it supports seperate "lon" or "lat" 
> range queries which would help, but its not a multivalue field, which I need.
>  > Date: Wed, 29 Aug 2012 07:20:42 -0700
> 
> > From: [hidden 
> > email]
> > To: [hidden 
> > email]
> > Subject: Re: Solr 4.0 - Join performance
> >
> > Solr 4 is certainly the goal.  There's a bit of a setback at the moment 
> > until some of the Lucene spatial API is re-thought.  I'm working heavily on 
> > such things this week.
> > ~ David
> >
> > On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> >
> >
> > David, Solr support for this will come in Solr-3304 I 
> > suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea 
> > if this is going to make it into Solr 4.0? Thanks,Eric.
> >  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> >
> > > From: [hidden 
> > > email]
> > > To: [hidden 
> > > email]
> > > Subject: RE: Solr 4.0 - Join performance
> > >
> > > You would index rectangles of 0 height but that have a left edge 'x' of 
> > > the
> > > start time and a right edge 'x' of your end time.  You can index a 
> > > variable
> > > number of these per Solr document and then query by either a point or
> > > another rectangle to find documents which intersect your query shape.  It
> > > can't do a completely within based query, just intersection for now.  I
> > > really look forward to seeing this wrapped up in some sort of 
> > > RangeFieldType
> > > so that users don't have to think in spatial terms.
> > >
> > >
> > >
> > > -
> > >  Author: 
> > > http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > > --
> > > View this message in context: 
> > > http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > > Sent from the Solr - User mailing list archive at 
> > > Nabble.com>.
> >
> >
> > 
> > If you reply to this email, your message will be added to the discussion 
> > below:
> >
> > NAML
> >
> >
> >
> >
> >
> > -
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> > Sent from the Solr - User mailing list archive at 
> > Nabble.com.
> 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
> To unsubscribe from Solr 4.0 - Join performance, click 
> here.
> NAML
> 
> 
> 
> 
> 
> -
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
  

RE: Injest pauses

2012-08-29 Thread Voth, Brad (GE Corporate)
Interestingly it is not pausing during every commit so at least a portion of 
the time the async commit code is working.  Trying to track down the case where 
a wait would still be issued.

-Original Message-
From: Voth, Brad (GE Corporate) 
Sent: Wednesday, August 29, 2012 12:32 PM
To: solr-user@lucene.apache.org
Subject: RE: Injest pauses

Thanks, I'll continue with my testing and tracking down the block.

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, August 29, 2012 12:28 PM
To: solr-user@lucene.apache.org
Subject: Re: Injest pauses

On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate)  
wrote:
> Anyone know the actual status of SOLR-2565, it looks to be marked as 
> resolved in 4.* but I am still seeing long pauses during commits using
> 4.*

SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
least at the Solr level).

-Yonik
http://lucidworks.com


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread Kiran Jayakumar
You need this for both index and query:



http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory


On Wed, Aug 29, 2012 at 4:55 AM, aniljayanti  wrote:

> Hi,
>
> thanks for ur reply,
>
> I donot know how to remove multiple white spaces using regax in the search
> text. Can u share me that one.
>
> Thanks,
>
> AnilJayanti
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4003991.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr and query abortion

2012-08-29 Thread Aleksey Vorona
Hi, we are running Solr 3.6.1 and see an issue in our load tests. Some 
of the queries our load test script produces result in huge number of 
hits. It may go as high as 90% of all documents we have (2.5M). Those 
are all range queries. I see in the log that those queries take much 
more time to execute.


Since such a query does not make any sense from the end user 
perspective, I would like to limit its performance impact.


Is it possible to abort the query after certain number of document hits 
or certain time elapsed and return a error? I would render that error as 
"Please refine your search" message to the end user in my application. I 
know that many sites on the web do that, and I guess most of them do 
that with Solr.


I tried setting timeAllowed limit, but, for some reason, I did not see 
those query times to go down. I suspect that most of the time is spent 
not in Search phase (which is the only one respecting timeAllowed, as 
far as I know), but in the sorting phase. And still, I want to abort any 
longer running query. Otherwise they accumulate over time, pushing 
server's load average sky high and killing performance even for regular 
queries.


-- Aleksey


Problem with copyfield wild card

2012-08-29 Thread Kiran Jayakumar
Hi everyone,

I have several fields like Something_Misc_1, Something_Misc_2,
SomeOtherThing_Misc_1,... etc.

I have defined a copy field like this:



It doesnt capture the misc fields. Am I missing something ? Any help is
much appreciated.

Thanks


xinclude and relative files

2012-08-29 Thread Shawn Heisey
I found some discussion saying that starting in 3.1, files that you 
xinclude with a relative path from something like solrconfig.xml are 
relative to the location of the file with the xinclude.


I use a directory structure where solrconfig.xml in the individual core 
directory is a symlink to another directory, so that similar cores can 
share configs.  Currently those xincludes are absolute paths, but when I 
move to Solr4, I want to make them relative so that multiple config 
structures can coexist.  Will a relative xinclude start at the location 
of the symlink, or the actual file? I can't try it out at this time, 
though I will do so in a few days. If someone happens to know, it would 
help me greatly.


I'd like to learn what the community thinks SHOULD happen with a 
symlinked xml file that has relative xincludes.  Should the includes be 
relative to the actual file or the symlink?  Is this a question more 
suited to the -dev list?


Thanks,
Shawn



Re: Ordering of fields

2012-08-29 Thread Yonik Seeley
In 4.0 you can use the def function with pseudo-fields (returning
function results as doc field values)
http://wiki.apache.org/solr/FunctionQuery#def

fl=a,b,c:def(myfield,10)


-Yonik
http://lucidworks.com


On Wed, Aug 29, 2012 at 2:39 PM, Rohit Harchandani  wrote:
> Hi all,
> Is there a way to specify the order in which fields are returned by solr?
> Also, is it possible to make solr return a blank/default value for a field
> not present for a particular document, apart from giving a default value in
> the schema and having it indexed?
> Thanks,
> Rohit Harchandani


Re: Load Testing in Solr

2012-08-29 Thread Aleksey Vorona

On 12-08-29 11:44 AM, dhaivat dave wrote:

Hello everyone .

Can any one know any component or tool that can be used for testing the
solr performance.


People were recommending https://code.google.com/p/solrmeter/ earlier.

-- Aleksey



RE: Cloud assigning incorrect port to shards

2012-08-29 Thread Buttler, David
I think the issue was that I didn't have a solr.xml in the solr home.  I was a 
little confused by the example directory because there are actually 5 solr.xml 
files
% find . -name solr.xml
./multicore/solr.xml
./example-DIH/solr/solr.xml
./exampledocs/solr.xml
./contexts/solr.xml
./solr/solr.xml

Creating my own jetty installation directory without the example instances led 
to me deleting the solr/solr.xml file.

I have now created a new solr home and set up a solr.xml file there, and things 
look much better.

Thanks for the feedback,
Dave



-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Thursday, August 23, 2012 6:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Cloud assigning incorrect port to shards

Can you post your solr.xml file?

On Thursday, August 23, 2012, Buttler, David wrote:

> I am using the jetty container from the example.  The only thing I 
> have done is change the schema to match up my documents rather than 
> the example
>
> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com ]
> Sent: Wednesday, August 22, 2012 5:50 PM
> To: solr-user@lucene.apache.org 
> Subject: Re: Cloud assigning incorrect port to shards
>
> What container are you using?
>
> Sent from my iPhone
>
> On Aug 22, 2012, at 3:14 PM, "Buttler, David" 
> >
> wrote:
>
> > Hi,
> > I have set up a Solr 4 beta cloud cluster.  I have uploaded a config
> directory, and linked it with a configuration name.
> >
> > I have started two solr on two computers and added a couple of 
> > shards
> using the Core Admin function on the admin page.
> >
> > When I go to the admin cloud view, the shards all have the computer 
> > name
> and port attached to them.  BUT, the port is the default port (8983), 
> and not the port that I assigned on the command line.  I can still 
> connect to the correct port, and not the reported port.  I anticipate 
> that this will lead to errors when I get to doing distributed query, 
> as zookeeper seems to be collecting incorrect information.
> >
> > Any thoughts as to why the incorrect port is being stored in zookeeper?
> >
> > Thanks,
> > Dave
>


--
- Mark

http://www.lucidimagination.com


Re: Problem with copyfield wild card

2012-08-29 Thread Jack Krupansky
Alas, copyField does not support full glob. Just like dynamicField, you can 
only use * at the start or end of the source field name, but not both.


-- Jack Krupansky

-Original Message- 
From: Kiran Jayakumar

Sent: Wednesday, August 29, 2012 1:41 PM
To: solr-user@lucene.apache.org
Subject: Problem with copyfield wild card

Hi everyone,

I have several fields like Something_Misc_1, Something_Misc_2,
SomeOtherThing_Misc_1,... etc.

I have defined a copy field like this:



It doesnt capture the misc fields. Am I missing something ? Any help is
much appreciated.

Thanks 



Re: Problem with copyfield wild card

2012-08-29 Thread Kiran Jayakumar
Thank you Jack.


On Wed, Aug 29, 2012 at 12:10 PM, Jack Krupansky wrote:

> Alas, copyField does not support full glob. Just like dynamicField, you
> can only use * at the start or end of the source field name, but not both.
>
> -- Jack Krupansky
>
> -Original Message- From: Kiran Jayakumar
> Sent: Wednesday, August 29, 2012 1:41 PM
> To: solr-user@lucene.apache.org
> Subject: Problem with copyfield wild card
>
>
> Hi everyone,
>
> I have several fields like Something_Misc_1, Something_Misc_2,
> SomeOtherThing_Misc_1,... etc.
>
> I have defined a copy field like this:
>
> 
>
> It doesnt capture the misc fields. Am I missing something ? Any help is
> much appreciated.
>
> Thanks
>


Re: Maximum index size on single instance of Solr

2012-08-29 Thread Michael Della Bitta
Unfortunately the answer for this can vary quite a bit based on a
number of factors:

1. Whether or not fields are stored,
2. Document size,
3. Total term count,
4. Solr version

etc.

We have two major indexes, one for servicing online queries, and one
for batch processing. Our batch index is performance critical and
therefore was optimized for throughput, was stored in RAM, and has
less stored fields than the online query one. The batch index shards
are 25Gb or less, and we're trending toward smaller and more numerous
shards. This is with 1.4, and I'm just finishing up on our migration
to 3.6.1.

Michael Della Bitta

P.S. Why'd you CC honeybadger? Honeybadger don't care...


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 29, 2012 at 5:17 PM, Michael Brandt
 wrote:
> Hi all,
>
> I am looking for information on how many documents may be indexed by a
> single instance of Solr (not using shards) before performance issues are
> encountered. In searching the internet I've come across some varying
> answers; one answer suggest 50GBs is
> problematic;
> this blog 
> poston
> sharding Solr in AWS says sharding is not necessary until you have
> "millions of records," but is no more specific.
>
> What experiences have you had with this? At what point did you find it
> necessary to scale up Solr, in terms of both number of records and size of
> index (whether MB, GB, etc.)?
>
> Thanks,
> Michael Brandt


Re: LateBinding

2012-08-29 Thread Chris Hostetter

: In-Reply-To: <1346241342637-4003991.p...@n3.nabble.com>
: References: <1343815485386-3998559.p...@n3.nabble.com>
:  
:  <1343892838577-3998721.p...@n3.nabble.com>
:  
:  <1346155058429-4003689.p...@n3.nabble.com>
:  
:  <1346241342637-4003991.p...@n3.nabble.com>
: Subject: LateBinding

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread David Martin
Folks:

One of our files of XML entities for import is almost 7GB in size.

When trying to import, we error out with the exception below.  6845266984 is 
the exact size of the input file in bytes.

Shouldn't the file size be a long?  Has anybody else experienced this problem?

We plan on dividing this file into smaller pieces, but if there's another 
solution I'd love to hear it.

Thanks,

David Martin

From: Desktop mailto:dmar...@netflix.com>>
Date: Wednesday, August 29, 2012 3:17 PM
Subject: contract item assets exception

Aug 29, 2012 10:04:03 PM org.apache.solr.handler.dataimport.SolrWriter upload
WARNING: Error creating document : 
SolrInputDocument[{fileSize=fileSize(1.0)={6845266984}, 
created_by=created_by(1.0)={CHILO}, 
id=id(1.0)={movie::70018848:country_code-NO:contract_id-9979:ccm_asset_id-369161014},
 movie_id=movie_id(1.0)={70018848}, is_required=is_required(1.0)={0}, 
bcp_47_code=bcp_47_code(1.0)={nn}, 
element_category_id=element_category_id(1.0)={3}, 
updated_by=updated_by(1.0)={SYSADMIN}, 
last_updated=last_updated(1.0)={2012-08-29T19:25:21.585Z}, 
entity_type=entity_type(1.0)={CONTRACT_ITEM_ASSET}, 
country_code=country_code(1.0)={NO}, 
ccm_asset_id=ccm_asset_id(1.0)={369161014}}]
org.apache.solr.common.SolrException: ERROR: 
[doc=movie::70018848:country_code-NO:contract_id-9979:ccm_asset_id-369161014] 
Error adding field 'fileSize'='6845266984'
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66)
at 
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
Caused by: java.lang.NumberFormatException: For input string: "6845266984"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:461)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.solr.schema.TrieField.createField(TrieField.java:407)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
... 12 more



Re: Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread Walter Underwood
Break it up.

You'll need 7GB of RAM for the source, at least that much for the parsed 
version, at least that much for the indexes, and so on.

Why try to make something work when you aren't going to do it that way in 
production?

wunder

On Aug 29, 2012, at 3:38 PM, David Martin wrote:

> Folks:
> 
> One of our files of XML entities for import is almost 7GB in size.
> 
> When trying to import, we error out with the exception below.  6845266984 is 
> the exact size of the input file in bytes.
> 
> Shouldn't the file size be a long?  Has anybody else experienced this problem?
> 
> We plan on dividing this file into smaller pieces, but if there's another 
> solution I'd love to hear it.
> 
> Thanks,
> 
> David Martin
> 
> From: Desktop mailto:dmar...@netflix.com>>
> Date: Wednesday, August 29, 2012 3:17 PM
> Subject: contract item assets exception
> 
> Aug 29, 2012 10:04:03 PM org.apache.solr.handler.dataimport.SolrWriter upload
> WARNING: Error creating document : 
> SolrInputDocument[{fileSize=fileSize(1.0)={6845266984}, 
> created_by=created_by(1.0)={CHILO}, 
> id=id(1.0)={movie::70018848:country_code-NO:contract_id-9979:ccm_asset_id-369161014},
>  movie_id=movie_id(1.0)={70018848}, is_required=is_required(1.0)={0}, 
> bcp_47_code=bcp_47_code(1.0)={nn}, 
> element_category_id=element_category_id(1.0)={3}, 
> updated_by=updated_by(1.0)={SYSADMIN}, 
> last_updated=last_updated(1.0)={2012-08-29T19:25:21.585Z}, 
> entity_type=entity_type(1.0)={CONTRACT_ITEM_ASSET}, 
> country_code=country_code(1.0)={NO}, 
> ccm_asset_id=ccm_asset_id(1.0)={369161014}}]
> org.apache.solr.common.SolrException: ERROR: 
> [doc=movie::70018848:country_code-NO:contract_id-9979:ccm_asset_id-369161014] 
> Error adding field 'fileSize'='6845266984'
> at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:333)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
> at 
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
> at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:66)
> at 
> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:723)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:709)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:619)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:327)
> at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:375)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:445)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:426)
> Caused by: java.lang.NumberFormatException: For input string: "6845266984"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> at java.lang.Integer.parseInt(Integer.java:461)
> at java.lang.Integer.parseInt(Integer.java:499)
> at org.apache.solr.schema.TrieField.createField(TrieField.java:407)
> at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
> at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
> at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)
> ... 12 more
> 






Re: Large XML file sizes error out parsing the file size as an Integer

2012-08-29 Thread Chris Hostetter

: Shouldn't the file size be a long?  Has anybody else experienced this problem?

Your problem does not apear to be any internal limitation in Solr - your 
problem appears to be that you have a field in your schema named 
"fileSize" which uses a fieldType that is a "TrieIntField" but you are 
attempting to put a value in that field that is not a legal integer.

Unless i'm missing something: If you wnat it to be a long, edit your 
schema to make it a long ?

: Caused by: java.lang.NumberFormatException: For input string: "6845266984"
: at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
: at java.lang.Integer.parseInt(Integer.java:461)
: at java.lang.Integer.parseInt(Integer.java:499)
: at org.apache.solr.schema.TrieField.createField(TrieField.java:407)
: at org.apache.solr.schema.SchemaField.createField(SchemaField.java:103)
: at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
: at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:286)

-Hoss


Re: Injest pauses

2012-08-29 Thread Alexey Serba
Could you take jstack dump when it's happening and post it here?

> Interestingly it is not pausing during every commit so at least a portion of 
> the time the async commit code is working.  Trying to track down the case 
> where a wait would still be issued.
> 
> -Original Message-
> From: Voth, Brad (GE Corporate) 
> Sent: Wednesday, August 29, 2012 12:32 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Injest pauses
> 
> Thanks, I'll continue with my testing and tracking down the block.
> 
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, August 29, 2012 12:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Injest pauses
> 
> On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) 
>  wrote:
>> Anyone know the actual status of SOLR-2565, it looks to be marked as 
>> resolved in 4.* but I am still seeing long pauses during commits using
>> 4.*
> 
> SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
> least at the Solr level).
> 
> -Yonik
> http://lucidworks.com


Re: Load Testing in Solr

2012-08-29 Thread Otis Gospodnetic
Hello,

JMeter, SolrMeter, HP LoadRunner ah, there is another open-source one that 
I like whose name I can't recall now.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 





- Original Message -
> From: dhaivat dave 
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Wednesday, August 29, 2012 2:44 PM
> Subject: Load Testing in Solr
> 
> Hello everyone .
> 
> Can any one know any component or tool that can be used for testing the
> solr performance.
> 
> 
> 
> 
> 
> 
> Thanks & Regards
> Dhaivat
>


Re: Injest pauses

2012-08-29 Thread Otis Gospodnetic
Hello Brad,

At one point you said CPU is at 100% and there is no disk IO.  Then in a 
separate email I think you said this happens during RAM -> Disk flush.  Isn't 
there a contradiction here?

A few thread dumps may tell you where things are "stuck".

Also, how does your JVM look while this is happening?  Could this be just 
Garbage Collection?  SPM (see URL in sig) may be helpful here.

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




- Original Message -
> From: "Voth, Brad (GE Corporate)" 
> To: "solr-user@lucene.apache.org" 
> Cc: 
> Sent: Wednesday, August 29, 2012 1:05 PM
> Subject: RE: Injest pauses
> 
> Interestingly it is not pausing during every commit so at least a portion of 
> the 
> time the async commit code is working.  Trying to track down the case where a 
> wait would still be issued.
> 
> -Original Message-
> From: Voth, Brad (GE Corporate) 
> Sent: Wednesday, August 29, 2012 12:32 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Injest pauses
> 
> Thanks, I'll continue with my testing and tracking down the block.
> 
> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Wednesday, August 29, 2012 12:28 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Injest pauses
> 
> On Wed, Aug 29, 2012 at 11:58 AM, Voth, Brad (GE Corporate) 
>  wrote:
>>  Anyone know the actual status of SOLR-2565, it looks to be marked as 
>>  resolved in 4.* but I am still seeing long pauses during commits using
>>  4.*
> 
> SOLR-2565 is definitely committed - adds are no longer blocked by commits (at 
> least at the Solr level).
> 
> -Yonik
> http://lucidworks.com
>


Re: SolrCloud admin UI core/stats showing commit count even without no explicit commit

2012-08-29 Thread Erick Erickson
Been busy the last couple of days, sorry it took so long to get back

You have basically 2 questions:

About the 80% rate. It's not quite clear. What I meant was say you have
20M docs on a server. You push it until you max out the QPS rate, say that's
100 queries/second. Now, configure your load testing to put 80 QPS at the
hardware, and keep indexing documents until the QPS rate falls off. That
gives you a good upper bound on the number of docs you can put on your
hardware.

About racking more replicas. Under the covers, there's an internal load
balancing act that gets done. As you add more machines to your SolrCloud
cluster, each one becomes a replica of one of your shards. So if you have 2
shards, adding the fifth machine just becomes the 3rd replica of shard 1.
Adding a sixth machine becomes replica 3 of shard 2. The seventh machine
becomes replica 4 of shard 1. And so on.

Now, incoming queries only have to hit one replica of each shard. So your
QPS rate goes up. This is done internally or you can front your entire
cluster with a load balancer, either way.

BTW, another thing to look at is how memory is consumed on your machines.
Solr usually comes under memory pressure first, so if you're seeing a bunch
of swapping etc, you're probably putting too many docs on each shard.

Unfortunately, testing is really the only way to be sure.

And yeah, the whole SolrCloud is pretty new, and the docs always lag the
code.

Best
Erick


On Mon, Aug 27, 2012 at 11:31 AM, Srikanth S  wrote:
> Thanks for your response Erick.
>
> Your explanation seems to make sense for the commit count. But I guess the
> UI needs to be fixed.
>
> Regarding the performance, I went through your blog (nicely written btw
> (and good links to other interesting blogs too)). I didn't realize that
> everything that is indexed needs to be kept in memory for reasonable
> performance, and in that case 133M documents (each with several indexed
> fields) per shard, and for a server hosting 2 such shards, the memory we
> have provided does seem to be very less. I think we need to do an
> evaluation of our hardware as you pointed out. I didn't get one thing in
> your blog though: the paragraph that starts with: "Now, take say 80% of the
> QPS rate above...". I am assuming you meant "Keep adding 1M documents and
> see the point where the QPS drops to 80% of the above value". Correct me if
> I am wrong.
>
> Wrt the query rate, we were able to run at around 80-90 searches/sec with
> indexing off, and 50-60 searches/sec while indexing at an average rate of
> 500 inserts/sec.
>
> Regarding stacking up of replicas to get more QPS, I would have expected
> the same, but with very little documentation (and with some of them
> conflicting) on SolrCloud design, I was not very sure about that. So, if
> you can, and if you have access to, can you point me to some places where
> more details about the architecture of SolrCloud is explained? I'd
> appreciate that greatly.
>
> Thanks again.
>
> On Mon, Aug 27, 2012 at 6:33 AM, Erick Erickson 
> wrote:
>
>> The autocommits are about what I'd expect. 17 hours
>> == 102 ten minute blocks, which is roughly your
>> 115 autocommits. I'm _guessing_ that the total
>> commits are a combination of soft and hard. You'll
>> have 20,400 soft commits in that time frame, so this
>> works as a rough estimate
>>
>> And SolrJ doesn't do a commit after an add unless
>> you tell it to.
>>
>> As for search performance, it's quite hard to tell, But
>> you have about 133M documents/shard, and two
>> replicas. You have a relatively small amount of
>> memory allocated for indexes that size. It's time to
>> just dig into what you can expect out of your boxes.
>>
>> Here's a blog that outlines a way to understand more
>> about the capacity of your hardware that might help.
>> I'd take the SolrCloud bits out for right now, and just
>> concentrate on the capacity of the machine in your
>> situation, then add SolrCloud back in to the mix.
>>
>> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> It'd be interesting to see what your query rate
>> was if you stop the indexing process. Mostly I'm
>> just looking for which factors change performance,
>> not recommending that you go with that approach.
>>
>> The good news is that you can get virtually whatever
>> QPS rate you need by simply racking in more replicas
>> for each shard
>>
>> Best
>> Erick
>>
>>
>>
>>
>> On Sat, Aug 25, 2012 at 3:04 AM, Srikanth S  wrote:
>> > Hi,
>> >
>> > I am doing a small test for my company to see if SolrCloud is suitable
>> for
>> > our indexing needs. The setup is as follows:
>> >
>> >- Solr version 4.0 BETA1
>> >- Three physical machines hosting solr servers
>> >- Distributed ZooKeeper setup on the same three machines
>> >- 2 solr cores on each server: total 6 cores
>> >- 3 shards (and hence 1 replica each)
>> >- Machine M1 is leader of (shard 1, replica 1) and hosts 

Re: Configure logging with Solr 4 on Tomcat 7

2012-08-29 Thread Erick Erickson
Have you looked in catalina.out?

Best
Erick

On Mon, Aug 27, 2012 at 12:43 PM, Nicholas Ding  wrote:
> I put a logging.properties into solr/WEB-INF/classes, but I still not see
> any logs.
>
> On Mon, Aug 27, 2012 at 11:56 AM, Chantal Ackermann <
> c.ackerm...@it-agenten.com> wrote:
>
>>
>> Drop the logging.properties file into the solr.war at WEB-INF/classes .
>>
>> See here:
>> http://lucidworks.lucidimagination.com/display/solr/Configuring+Logging
>> Section "Tomcat Logging Settings"
>>
>> Cheers,
>> Chantal
>>
>> Am 27.08.2012 um 16:43 schrieb Nicholas Ding:
>>
>> > Hello,
>> >
>> > I've deployed Solr 4 on Tomcat 7, it is a multicore configuration,
>> > everything seems work fine, but I can't see any logs. How do I enable
>> > logging?
>> >
>> > Thanks
>> > Nicholas
>>
>>


Re: Fail to huge collection extraction

2012-08-29 Thread Erick Erickson
I really think you need to think about firing successive page requests
at the index and reporting in chunks.

Best
Erick

On Mon, Aug 27, 2012 at 2:56 PM, neosky  wrote:
> I am using Solr 3.5 and Jetty 8.12
> I need to pull out huge query results at a time(for example, 1 million
> documents, probably a couple gigabytes size) and my machine is about 64 G
> memory.
> I use the java bin and SolrJ as my client. And I use a Servelt to provide a
> query down service for the end user. However, when I pull out the result at
> a time, it fails.
> solrQuery.setStart(0);
> solrQuery.setRows(totalNumber);// the totalNumber sometimes is 1 million)
> logs:
> Aug 27, 2012 2:34:35 PM org.apache.solr.common.SolrException log
> SEVERE: org.eclipse.jetty.io.EofException
> at
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:422)
> at
> org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:512)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:159)
> at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:101)
> at
> org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:184)
> at
> org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:89)
> at
> org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:46)
> at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:336)
> ...
>
> I am not sure where is the bottleneck. I tried to increase the timeout
> solrServer.setSoTimeout(30);
>  solrServer.setConnectionTimeout(300);
>  solrServer.setDefaultMaxConnectionsPerHost(100);
>  solrServer.setMaxTotalConnections(300);
>
> I also tried to increase the cached documents in the Solr configuration
>  2
>
> It doesn't work at all. Any advice will be appreciated!
>
> Btw: I want to use the compression, but I don't know how it works. Because
> after my Java client pull out the result, I need to printer out to the end
> user as a download file.
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Fail-to-huge-collection-extraction-tp4003559.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Correctly importing and producing null in search results

2012-08-29 Thread Erick Erickson
If I'm reading this right, you're kind of stuck. Solr/DIH don't have any
way to reach out to your mapping file and "do the right thing"

A couple of things come to mind.
Use a Transformer in DIH to simply remove the field from the document
you're indexing. Then the absence of the field in the result set is NULL,
and 0 is 0. You could also do this in SolrJ.

And I have to ask why you transform output into JSON when you could
use the JSON response writer.

Best
Erick

On Mon, Aug 27, 2012 at 6:04 PM, David Martin  wrote:
> Smart Folks:
>
> I use JDBC to produce simple XML entities such as this one:
>
> 
>   AWARDTYPE
>   0
>   31
>   1
>   awardtypes::31:1
> 
>
> The XML entities are stored in file and loaded by the
> FileListEntityProcessor.
>
> In this case, the "movie_id" element has a value of zero because the JDBC
> getString("movie_id") method returned null.  I can search Solr for
> entities of this type (i.e. query on "entity_type:AWARDTYPE") and get back
> the appropriate result set.  Then, I want to transform the result set into
> JSON objects with fields that map to XML elements.
>
> Today, I have to teach the JSON mapping that it should convert 0 to
> JSONObject.NULL on a case-by-case basis -- I actually keep a mapping
> document around that dictates whether a zero should be handled this way.
>
> In some cases though, a zero may be legitimate where null values are also
> legit.  Sure, I could always change the zero to a less likely integer or
> such...
>
> ===
> But doesn't Solr and the Data Import Handler have a better way to read a
> null value from an XML entity during import, AND to represent it in search
> results?  Do I need a different approach depending on my field's type?
> ===
>
> I apologize if this is an asked and answered question.  None of my web
> searches turned up an answer.
>
> Thanks,
>
> David
>


Re: Null Pointer Exception on DIH with MySQL

2012-08-29 Thread Erick Erickson
Not much information to go on here, have you tried the DIH
debugging console? See:
http://wiki.apache.org/solr/DataImportHandler#interactive

Best
Erick

On Mon, Aug 27, 2012 at 7:22 PM, Aleksey Vorona  wrote:
> We have Solr 3.6.1 running on Jetty (7.x) and using DIH to get data from the
> MySQL database. On one of the environment the import always fails with an
> exception: http://pastebin.com/tG28cHPe
>
> It is a null pointer exception on connection being null. I've tested that I
> can connect from the Solr server to Mysql server via command line mysql
> client.
>
> Does anybody knows anything about this exception and how to fix it?
>
> I am not able to reproduce it on any other environment.
>
> -- Aleksey


Re: Frequently Updated Index and Caching

2012-08-29 Thread Erick Erickson
Hmmm, the critical thing here is not how often you change the index,
it's how often you commit.

Look at your Solr admin/stats page and your logs. You'll
see things like "hit ratio" and "cumulative hit ratio" for, particularly, your
filtercache. Whether you're getting decent hit ratios is what tells you whether
disabling caching is a good idea or not.

Best
Erick

On Tue, Aug 28, 2012 at 3:37 AM, deniz  wrote:
> Hi All,
>
> If we are updating out index very frequently on some fields, do we still
> need to use caching or we can simply disable it?
>
> http://wiki.apache.org/solr/SolrCaching#Overview
>
> After reading this part I feel like a frequently updated index wont need a
> cache for performance, as its data will be changing so frequently that the
> entries on the old cache could be old.
>
> I have read the rest of the page too and I think I should disable caching
> for our case, but I cant be sure either if indexing could improve the
> performance or not...
>
> anyone have used caching with frequent updates? if so could you please give
> me some information about that?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Frequently-Updated-Index-and-Caching-tp4003626.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Shard Replicas sharing files

2012-08-29 Thread Erick Erickson
Possible, kinda maybe. But then all of the SolrCloud goodness that's
there for HA/DR goes out the window because the shared index (actually
the hardware it's on) becomes a single point of failure. On the other
hand, you're using the word replica but not explicitly talking about
SolrCloud, so I guess this is just about standard master/slave
situations...

Where the answer is that it's generally not a great idea to share
indexes like this. The disk I/O becomes your bottleneck with all those
slaves asking to pull what then need off the disk at once, every time
it is committed to compounded with network latency.

But I have to ask, is this just a theoretical question or is it really
something you're having trouble with in production?

And the idea of a "replication tree", where N slaves get their index
from the master, then M slaves get their index from the first N slaves
sounds like a "repeater" setup, see:
http://wiki.apache.org/solr/SolrReplication#Setting_up_a_Repeater

Best
Erick

On Wed, Aug 29, 2012 at 4:23 AM, Christian von Wendt-Jensen
 wrote:
> Hi,
>
> I was wondering if it was possible to let all replicas of a shard share the 
> physical lucene files. In that way you would only need one set of files on a 
> shared storage, and then setup as many replicas as needed without copying 
> files around. This would make it very fast to optimize and rebalance hardware 
> resources as more shards are added.
>
> What I visioning was a setup with one master doing all the indexing. Then all 
> the shard replicas are installed as a string of slaves setup as both master 
> and slave, such that the first replica replicates directly from the master. 
> The next replica replicates from the first replica and so on.
>
> In this way only the first replica need to write indexfiles. When the next 
> replica is triggered to replicate it will find that all files are up to date, 
> and then you issue a "commit" to reload the index in memory, thereby being 
> up-to-date. The master's commit triggers a cascade of replication, which are 
> all up-to-date immediately, and then it is a matter of few seconds for the 
> slaves to be in sync with the master.
>
> Taking this though further, the first replica could actually access the 
> master's index files directly, and then be up-to-date without copying any 
> files.
>
> Would this setup be possible?
>
>
>
> Med venlig hilsen / Best Regards
>
> Christian von Wendt-Jensen
> IT Team Lead, Customer Solutions
>
> Infopaq International A/S
> Kgs. Nytorv 22
> DK-1050 København K
>
> Phone +45 36 99 00 00
> Mobile +45 31 17 10 07
> Email  
> christian.sonne.jen...@infopaq.com
> Webwww.infopaq.com
>


Re: Custom close to index metadata / pass commit data to writer.commit

2012-08-29 Thread Erick Erickson
You have to look at the "Resolution" entry. It's currently "unresolved", so
it hasn't been committed.

Best
Erick

On Wed, Aug 29, 2012 at 5:27 AM, Jozef Vilcek  wrote:
> Hi,
>
> I just wanted to check if someone have an idea about intentions with this 
> issue:
> https://issues.apache.org/jira/browse/SOLR-2701
>
> It is marked for 4.0-Alpha and there is already Beta out there.
> Can anyone tell if it planed to be part of 4.0 release.
>
> Best,
> Jozef
>
> On Sun, Jun 24, 2012 at 1:18 AM, Erick Erickson  
> wrote:
>> see: https://issues.apache.org/jira/browse/SOLR-2701.
>>
>> But there's an easier alternative. Just have a _very special_ document
>> with a known that you index at the end of the run that
>> 1> has no fields in common with any other document (except uniqueKey)
>> 2> contains whatever data you want to carry around in whatever format you 
>> want.
>>
>> Now whenever you query for that document by ID, you get your info. And
>> since you can't search the doc until after it's been committed, you know
>> that the preceding documents have all been persisted
>>
>> Of course whenever you send a version of the doc it will overwrite the
>> one before since it has the same 
>>
>> Best
>> Erick
>>
>> On Fri, Jun 22, 2012 at 5:34 AM, Jozef Vilcek  wrote:
>>> Hi everyone,
>>>
>>> I am seeking to solution to store some custom data very close to /
>>> within index. I have found a possibility to pass commit "user" data to
>>> IndexWriter:
>>> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#commit(java.util.Map)
>>> which are from what I understand stored somewhere close to segments
>>> "metadata" like index version, generation, ...
>>>
>>> Now, I see no easy way to accumulate and pass along such data with
>>> Solr 3.6. DirectUpdateHandler2 is committing implicitly via close
>>> rather than invoking commit API. I can extend DirectUpdateHander2 and
>>> alter closeWriter method but still ... I am not yet clear how to pass
>>> along request level params which are not available at
>>> DirectUpdateHandler2 level. It seems that passing commitData is not
>>> supported ( maybe not wanted to by by design ) and not going to be as
>>> when I look at Solr trunk, I see implicit commit removed,
>>> writer.commit with passing commitData used but no easy way how to pass
>>> custom commit data nor how to easily hook in.
>>>
>>> Any recommendations for how to store some data close to index?
>>>
>>> To throw some light why I what this ... Basically I want to store
>>> there some kind of time stamp, which defines what is already in the
>>> index with respect to feeding updates from external world. Now, my
>>> index is replicated to other index instance in different data center
>>> (serving traffic as well). When default document feed in DC1 go south
>>> for some reason, backup in DC2 bumps in to keep updates alive ... but
>>> it has to know from where the feed should start ... that would be that
>>> kind of time stamp stored and replicated with index.
>>>
>>> Many thanks in advance.
>>>
>>> Best,
>>> Jozef


Re: Re: Antwort: Re: refiltering search results

2012-08-29 Thread Erick Erickson
Perhaps you're making this harder than it needs to be.

The preferred way of handling ACL calculations is by group. That is,
you just have a multiValued field in each document that contains the
groups have permissions for that document, say G1, G2, G3. Then you
just add an fq clause for the query that contains the groups the user
belongs to, as &fq=acl:(G1 G2 G3). this woks well when you have, say,
< 100 groups (or authorization tokens) and know them at index time.

The other thing you can consider is a PostFilter, here's the JIRA:
https://issues.apache.org/jira/browse/SOLR-2429

The third option is to use a custom Function, see
http://wiki.apache.org/solr/FunctionQuery
The trick here is that you can use the contents of fields (really, numeric
data works best) to compute a value that is multiplied into the score. If
you return 0 when the user isn't permitted access, the score becomes 0
and the document isn't returned.

the thing you have to be careful of is that accessing the contents of fields
for each document in a search can be very, very costly.

Best
Erick


On Wed, Aug 29, 2012 at 7:28 AM,   wrote:
> Von:
> Ahmet Arslan 
> An:
> solr-user@lucene.apache.org
> Datum:
> 29.08.2012 10:50
> Betreff:
> Re: Antwort: Re: refiltering search results
>
>
> Thanks for the answer.
>
> My next question is how can i filter the result or how to replace the old
> ResponseBuilder Result with a new one?
>
>
> --- On Wed, 8/29/12, johannes.schwendin...@blum.com
>  wrote:
>
>> From: johannes.schwendin...@blum.com 
>> Subject: Antwort: Re: refiltering search results
>> To: solr-user@lucene.apache.org
>> Date: Wednesday, August 29, 2012, 8:22 AM
>> The main idea is to filter results as
>> much as possible with solr an then
>> check this result again.
>> To do this I have to read some information from some fields
>> of the
>> documents in the result.
>> At the moment I am trying to do this in the process method
>> of a Search
>> Component. But I even dont know
>> how to get access to the search results or the index Fields
>> of the
>> documents.
>> I have thought of ResponseBuilder.getResults() but after I
>> have the
>> DocListandSet Object I get stuck.
>
>
> You can read information from some fields using DocListandSet with
>
> org.apache.solr.util.SolrPluginUtils#docListToSolrDocumentList
>
> method.
>


Re: Hierarchical faceting and filter query exclusions

2012-08-29 Thread Erick Erickson
See "Tagging and excluding filters" here:
http://lucidworks.lucidimagination.com/display/solr/Faceting


Best
Erick


On Wed, Aug 29, 2012 at 11:44 AM, Nicholas Swarr  wrote:
> We're using Solr 4.0 Beta, testing the hierarchical faceting support to see 
> if it's a good fit to facet on taxonomies.  One issue we've encountered is 
> that we can't apply filter exclusions to the hierarchical facets so as to 
> preserve facet count with multi-select.  I haven't been able to locate or 
> otherwise determine if there's documentation that would outline how this is 
> done.  We've tried a few things with local params but it appears those aren't 
> parsed with the facet.pivot argument.  I found this ticket related to that:
>
> https://issues.apache.org/jira/browse/SOLR-2255
>
> Could anyone offer some insight or guidance on this?
>
> Thanks,
> Nick
> This e-mail message, and any attachments, is intended only for the use of the 
> individual or entity identified in the alias address of this message and may 
> contain information that is confidential, privileged and subject to legal 
> restrictions and penalties regarding its unauthorized disclosure and use. Any 
> unauthorized review, copying, disclosure, use or distribution is strictly 
> prohibited. If you have received this e-mail message in error, please notify 
> the sender immediately by reply e-mail and delete this message, and any 
> attachments, from your system. Thank you.
>
>


Re: Solr contribs build and jar-of-jars

2012-08-29 Thread Lance Norskog
I found a couple implementations of a crazy classloader that finds
jars inside a jar. I tested the 'zipfileset' feature of 'ant zip'
which works well. It unpacks the outboard jar directly into the target
without making a separate staging directory, and ran surprisingly fast
on my laptop. So, jars-in-a-jar are a less optimal feature than they
used to be.

Yeah, I know about the Maven plugin. Mahout uses it, and it does an
INFO log for every stinking class file it repacks. Like I care!

On Sun, Aug 19, 2012 at 11:43 PM, Chantal Ackermann
 wrote:
> Hi Lance,
>
> does this do what you want?
>
> http://maven.apache.org/plugins/maven-assembly-plugin/descriptor-refs.html#jar-with-dependencies
>
> It's maven but that would be an advantage I'd say… ;-)
>
> Chantal
>
> Am 05.08.2012 um 01:25 schrieb Lance Norskog:
>
>> Has anybody tried packaging the contrib distribution jars in the
>> jar-of-jars format? Or merging all included jars into one super-jar?
>>
>> The OpenNLP contrib has a Lucene analyze, 3 external jars, and Solr
>> classes. Packaging this sucker is proving painful in the extreme. UIMA
>> has the same problem. 'ant' has a task for generating the manifest
>> class path for a jar-of-jars, and the technique actually works:
>>
>> http://ant.apache.org/manual/Tasks/manifestclasspath.html
>> http://stackoverflow.com/questions/858766/generate-manifest-class-path-from-classpath-in-ant
>> http://grokbase.com/t/ant/user/0213wdmn51/building-a-fileset-dynamically#20020103j47ufvwooklrovrjfdvirgohe4
>>
>> If this works completely, it seems like the right way to build the
>> dist/ jars for the contribs.
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>



-- 
Lance Norskog
goks...@gmail.com


Re: Document Processing

2012-08-29 Thread Lance Norskog
I've seen the JSoup HTML parser library used for this. It worked
really well. The Boilerpipe library may be what you want. Its
schwerpunkt (*) is to separate boilerplate from wanted text in an HTML
page. I don't know what fine-grained control it has.

* raison d'être. There is no English word for this concept.

On Tue, Dec 6, 2011 at 1:39 PM, Tommaso Teofili
 wrote:
> Hello Michael,
>
> I can help you with using the UIMA UpdateRequestProcessor [1]; the current
> implementation uses in-memory execution of UIMA pipelines but since I was
> planning to add the support for higher scalability (with UIMA-AS [2]) that
> may help you as well.
>
> Tommaso
>
> [1] :
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java
> [2] : http://uima.apache.org/doc-uimaas-what.html
>
> 2011/12/5 Michael Kelleher 
>
>> Hello Erik,
>>
>> I will take a look at both:
>>
>> org.apache.solr.update.**processor.**LangDetectLanguageIdentifierUp**
>> dateProcessor
>>
>> and
>>
>> org.apache.solr.update.**processor.**TikaLanguageIdentifierUpdatePr**
>> ocessor
>>
>>
>> and figure out what I need to extend to handle processing in the way I am
>> looking for.  I am assuming that "component" configuration is handled in a
>> standard way such that I can configure my new UpdateProcessor in the same
>> way I would configure any other UpdateProcessor "component"?
>>
>> Thanks for the suggestion.
>>
>>
>> 1 more question:  given that I am probably going to convert the HTML to
>> XML so I can use XPath expressions to "extract" my content, do you think
>> that this kind of processing will overload Solr?  This Solr instance will
>> be used solely for indexing, and will only ever have a single ManifoldCF
>> crawling job feeding it documents at one time.
>>
>> --mike
>>



-- 
Lance Norskog
goks...@gmail.com


Re: How do I represent a group of customer key/value pairs

2012-08-29 Thread Lance Norskog
I do not understand exactly the data modeling problem.
PathHierarchyTokenizerFactory may be what you're looking for. You
might have to combine this with a charfilter or some token filters to
get exactly what you want. Maybe have two fields, one which only saves
the leaf words and the other that only saves the tree words?

On Sat, Aug 25, 2012 at 7:45 PM, Sheldon P  wrote:
> Thanks Lance.  It looks like it's worth investigating.  I've already
> started down the path of using a bean with "@Field(map_*)" on my
> HashMap setter.  This defect tipped me off on this functionality:
> https://issues.apache.org/jira/browse/SOLR-1357
> This technique provides me with a mechanism to store the HashMap data,
> but flattens the structure.  I'll play with the ideas provided on
> "http://wiki.apache.org/solr/HierarchicalFaceting";.  If anyone has
> some sample code (java + schema.xml) they can point me too that does
> "Hierarchical Faceting" I would very much appreciate it.
>
>
> On Sat, Aug 25, 2012 at 6:42 PM, Lance Norskog  wrote:
>> There are more advanced ways to embed hierarchy in records. This describes 
>> them:
>>
>> http://wiki.apache.org/solr/HierarchicalFaceting
>>
>> (This is a great page, never noticed it.)
>>
>> On Fri, Aug 24, 2012 at 8:12 PM, Sheldon P  wrote:
>>> Thanks for the prompt reply Jack.  Could you point me towards any code
>>> examples of that technique?
>>>
>>>
>>> On Fri, Aug 24, 2012 at 4:31 PM, Jack Krupansky  
>>> wrote:
 The general rule in Solr is simple: denormalize your data.

 If you have some maps (or tables) and a set of keys (columns) for each map
 (table), define fields with names like _, such as
 "map1_name", "map2_name", "map1_field1", "map2_field1". Solr has dynamic
 fields, so you can define "_*" to have a desired type - if all 
 the
 keys have the same type.

 -- Jack Krupansky

 -Original Message- From: Sheldon P
 Sent: Friday, August 24, 2012 3:33 PM
 To: solr-user@lucene.apache.org
 Subject: How do I represent a group of customer key/value pairs


 I've just started to learn Solr and I have a question about modeling data
 in the schema.xml.

 I'm using SolrJ to interact with my Solr server.  It's easy for me to store
 key/value paris where the key is known.  For example, if I have:

 title="Some book title"
 author="The authors name"


 I can represent that data in the schema.xml file like this:

>>> stored="true"/>
>>> stored="true"/>

 I also have data that is stored as a Java HashMap, where the keys are
 unknown:

 Map map = new HashMap();
 map.put("some unknown key", "some unknown data");
 map.put("another unknown key", "more unknown data");


 I would prefer to store that data in Solr without losing its hierarchy.
 For example:

 

 >>> stored="true"/>

 >>> stored="true"/>

 


 Then I could search for "some unknown key", and receive "some unknown 
 data".

 Is this possible in Solr?  What is the best way to store this kind of data?
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com



-- 
Lance Norskog
goks...@gmail.com


Patch 2429 for solr1.3?

2012-08-29 Thread Sujatha Arun
Can we use the patch 2429 in solr 1.3?

Regards
Sujatha


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi 

thanks,

I tried with below said changes, but getting same result as earlier.

suggest/?q="michael ja"
---


  
  
  
  
   
   
  
   
   
  

Response :

  
- 
- 
  0 
  1 
  
   
- 
- 
- 
  10 
  1 
  8 
- 
  *michael "bully" herbig 
  michael bolton 
  michael bolton: arias 
  michael falch 
  michael holm 
  michael jackson 
  michael neale 
  michael penn 
  michael salgado 
  michael w. smith* 
  
  
- 
  10 
  9 
  11 
- 
*  ja me tanssimme 
  jacob andersen 
  jacob haugaard 
  jagged edge 
  jaguares 
  jamiroquai 
  jamppa tuominen 
  jane olivor 
  janis joplin 
  janne tulkki* 
  
  
  "michael "bully" herbig ja me tanssimme" 
  
  
  

Please Help,

AnilHayanti




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4004230.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AW: AW: auto completion search with solr using NGrams in SOLR

2012-08-29 Thread aniljayanti
Hi,

thanks,

I added "PatternReplaceFilterFactory" like below.Getting results
differently(not like suggester). You suggested to remove
"KeywordTokenizerFactory" , "PatternReplace" is a FilterFactory, then which
"TokenizerFactory" need to use ? 

  

  
   

  
  
   
   
  
 
  
 
   
  

Result :

 
- 
- 
  0 
  2 
  
   
- 
- 
- 
  10 
  0 
  7 
- 
  *michael 
  michael 
  michael " 
  michael j 
  michael ja 
  michael jac 
  michael jack 
  michael jacks 
  michael jackso 
  michael jackson* 
  
  
- 
  10 
  8 
  10 
- 
  *ja 
  ja 
  jag 
  jam 
  jami 
  jamp 
  jampp 
  jamppa 
  jamppa 
  jamppa t* 
  
  
  michael ja 
  
  
  

Please suggest me if anything missing?

Regards,

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4004231.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrj api for partial document update

2012-08-29 Thread Yoni Amir
Is there a solrj api for partial document update in solr 4.

It is described here: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

That article explains how the xml structure should be. I want to use solrj api, 
but I can't figure out if it is supported.

Thanks,
Yoni

-Original Message-
From: aniljayanti [mailto:anil.jaya...@gmail.com] 
Sent: Thursday, August 30, 2012 7:41 AM
To: solr-user@lucene.apache.org
Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR

Hi,

thanks,

I added "PatternReplaceFilterFactory" like below.Getting results
differently(not like suggester). You suggested to remove
"KeywordTokenizerFactory" , "PatternReplace" is a FilterFactory, then which
"TokenizerFactory" need to use ? 

  

  
   

  
  
   
   
  
 
  
 
   
  

Result :

 
- 
- 
  0 
  2 
  
   
- 
- 
- 
  10 
  0 
  7 
- 
  *michael 
  michael 
  michael " 
  michael j 
  michael ja 
  michael jac 
  michael jack 
  michael jacks 
  michael jackso 
  michael jackson* 
  
  
- 
  10 
  8 
  10 
- 
  *ja 
  ja 
  jag 
  jam 
  jami 
  jamp 
  jampp 
  jamppa 
  jamppa 
  jamppa t* 
  
  
  michael ja 
  
  
  

Please suggest me if anything missing?

Regards,

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4004231.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sorting on mutivalued fields still impossible?

2012-08-29 Thread Uwe Reh

Hi,
just to be sure.

There is still no way to sort by multivalued fields?
"...&sort=max(datefield) desc&"

There is no smarter option, than creating additional singelevalued 
fields just for sorting?

"eg. datafield_max and datefield_min"

Uwe


Re: Null Pointer Exception on DIH with MySQL

2012-08-29 Thread Aleksey Vorona
Thank you for the reply. We rebuilt solr from sources, reinstalled it 
and the problem went away. As it was never reproducible on any other 
server, I blame some mysterious java byte code corruption on that 
server. The assumption I would never be able to verify, because we did 
not make a copy of the previous binaries.


-- Aleksey

On 12-08-29 06:17 PM, Erick Erickson wrote:

Not much information to go on here, have you tried the DIH
debugging console? See:
http://wiki.apache.org/solr/DataImportHandler#interactive

Best
Erick

On Mon, Aug 27, 2012 at 7:22 PM, Aleksey Vorona  wrote:

We have Solr 3.6.1 running on Jetty (7.x) and using DIH to get data from the
MySQL database. On one of the environment the import always fails with an
exception: http://pastebin.com/tG28cHPe

It is a null pointer exception on connection being null. I've tested that I
can connect from the Solr server to Mysql server via command line mysql
client.

Does anybody knows anything about this exception and how to fix it?

I am not able to reproduce it on any other environment.

-- Aleksey