date:20130923

Re: SolrCloud High Availability during indexing operation

2013-09-23 Thread Saurabh Saxena

Doc count did not change after I restarted the nodes. I am doing a single
commit after all 80k docs. Using Solr 4.4.

Regards,
Saurabh


On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Interesting. Did the doc count change after you started the nodes again?
> Can you tell us about commits?
> Which version? 4.5 will be out soon.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:
>
> > Hello,
> >
> > I am testing High Availability feature of SolrCloud. I am using the
> > following setup
> >
> > - 8 linux hosts
> > - 8 Shards
> > - 1 leader, 1 replica / host
> > - Using Curl for update operation
> >
> > I tried to index 80K documents on replicas (10K/replica in parallel).
> > During indexing process, I stopped 4 Leader nodes. Once indexing is done,
> > out of 80K docs only 79808 docs are indexed.
> >
> > Is this an expected behaviour ? In my opinion replica should take care of
> > indexing if leader is down.
> >
> > If this is an expected behaviour, any steps that can be taken from the
> > client side to avoid such a situation.
> >
> > Regards,
> > Saurabh Saxena
> >
>

Re: solr4.4 admin page show "loading"

2013-09-23 Thread William Bell

Use Chrome.


On Thu, Sep 19, 2013 at 7:32 AM, Micheal Chao wrote:

> hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see the
> solr admin page, it's always show "loading". I can't find any error in
> tomcat logs, and I can send search request, and get the result.
>
> what can I do? please help me, thank you very much.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4091039.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

join datas from muliti collections

2013-09-23 Thread YouPeng Yang

Hi

   I have two collections with different schema.
   And I want to do inner join  like SQL:

  select A.xx,B.xx
  from  A,B
  where A.yy=B.yy

   How can I achieve this in Solr.  I'm using SolrCloud with solr 4.4


regards

Re: java.lang.LinkageError when using custom filters in multiple cores

2013-09-23 Thread Hayden Muhl

Upgraded to 4.4.0, and that seems to have fixed it.

The transition was mostly painless once I realized that the interface to
the AbstractAnalysisFactory had changed between 4.2 and 4.3.

Thanks.

- Hayden


On Sat, Sep 21, 2013 at 3:28 AM, Alexandre Rafalovitch
wrote:

> Did you try latest solr? There was a library loading bug with multiple
> cores. Not a perfect match to your description but close enough.
>
> Regards,
> Alex
> On 21 Sep 2013 02:28, "Hayden Muhl"  wrote:
>
> > I have two cores "favorite" and "user" running in the same Tomcat
> instance.
> > In each of these cores I have identical field types "text_en", "text_de",
> > "text_fr", and "text_ja". These fields use some custom token filters I've
> > written. Everything was going smoothly when I only had the "favorite"
> core.
> > When I added the "user" core, I started getting java.lang.LinkageErrors
> > being thrown when I start up Tomcat. The error always happens with one of
> > the classes I've written, but it's unpredictable which class the
> > classloader chokes on.
> >
> > Here's the really strange part. I comment out the "text_*" fields in the
> > user core and the errors go away (makes sense). I add "text_en" back in,
> no
> > error (OK). I add "text_fr" back in, no error (OK). I add "text_de" back
> > in, and I get the error (ah ha!). I comment "text_de" out again, and I
> > still get the same error (wtf?).
> >
> > I also put a break point at
> >
> >
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424),
> > and when I load everything one at a time, I don't get any errors.
> >
> > I'm running Tomcat 5.5.28, Java version 1.6.0_39 and Solr 4.2.0. I'm
> > running this all within Eclipse 1.5.1 on a mac. I have not tested this
> on a
> > production-like system yet.
> >
> > Here's an example stack trace. In this case it was one of my Japanese
> > filters, but other times it will choke on my synonym filter, or my
> compound
> > word filter. The specific class it fails on doesn't seem to be relevant.
> >
> > SEVERE: null:java.lang.LinkageError: loader (instance of
> >  org/apache/catalina/loader/WebappClassLoader): attempted  duplicate
> class
> > definition for name: "com/shopstyle/solrx/KatakanaVuFilterFactory"
> > at java.lang.ClassLoader.defineClass1(Native Method)
> > at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
> > at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
> > at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
> > at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
> > at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > at
> >
> >
> org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:904)
> > at
> >
> >
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1353)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:295)
> > at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:249)
> > at
> >
> >
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424)
> > at
> >
> >
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:462)
> > at
> >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89)
> > at
> >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
> > at
> >
> >
> org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:392)
> > at
> >
> >
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
> > at
> >
> >
> org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
> > at
> >
> >
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
> > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:373)
> > at org.apache.solr.schema.IndexSchema.(IndexSchema.java:121)
> > at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1018)
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecu

Re: Solr query processing

2013-09-23 Thread Otis Gospodnetic

That's right.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 23, 2013 12:55 PM, "Scott Smith"  wrote:

> I just want to state a couple of things and hear someone say, "that's
> right".
>
>
> 1.   In a solr query you can have multiple fq's, but only a single q.
>  And yes, I can simply AND the multiple "q"s together.  Just want to avoid
> that if I'm wrong.
>
> 2.   A subtler issue is that when a full query is executied, Solr must
> look at the schema to see how each field was tokenized (or not) and the
> various other filters applied to a field so that it can properly transform
> fields data (e.g., tokenize the text, but not keywords).  As an aside, it
> would be nice if the queryparser could do the same thing in Lucene (I know,
> wrong forum :)).
> Scott
>

Re: SolrCloud High Availability during indexing operation

2013-09-23 Thread Otis Gospodnetic

Interesting. Did the doc count change after you started the nodes again?
Can you tell us about commits?
Which version? 4.5 will be out soon.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Sep 23, 2013 8:37 PM, "Saurabh Saxena"  wrote:

> Hello,
>
> I am testing High Availability feature of SolrCloud. I am using the
> following setup
>
> - 8 linux hosts
> - 8 Shards
> - 1 leader, 1 replica / host
> - Using Curl for update operation
>
> I tried to index 80K documents on replicas (10K/replica in parallel).
> During indexing process, I stopped 4 Leader nodes. Once indexing is done,
> out of 80K docs only 79808 docs are indexed.
>
> Is this an expected behaviour ? In my opinion replica should take care of
> indexing if leader is down.
>
> If this is an expected behaviour, any steps that can be taken from the
> client side to avoid such a situation.
>
> Regards,
> Saurabh Saxena
>

SolrCloud High Availability during indexing operation

2013-09-23 Thread Saurabh Saxena

Hello,

I am testing High Availability feature of SolrCloud. I am using the
following setup

- 8 linux hosts
- 8 Shards
- 1 leader, 1 replica / host
- Using Curl for update operation

I tried to index 80K documents on replicas (10K/replica in parallel).
During indexing process, I stopped 4 Leader nodes. Once indexing is done,
out of 80K docs only 79808 docs are indexed.

Is this an expected behaviour ? In my opinion replica should take care of
indexing if leader is down.

If this is an expected behaviour, any steps that can be taken from the
client side to avoid such a situation.

Regards,
Saurabh Saxena

Select all descendants in a relation index

2013-09-23 Thread Semiaddict

Hello,

I am using Solr to index Drupal node relations (over 300k relations on over 
500k nodes), where each relation consists of the following fields:
- id : the id of the relation
- source_id : the source (parent) node id
- targe_id : the targe (child) node id

I need to be able to find all descendants of a node with one query.
So far I've managed to get direct children using the join syntax of Solr4 such 
as (http://wiki.apache.org/solr/Join): 
/solr/collection/select?q={!join from=source_id to=target_id}source_id:12

Note that each node can have multiple parents and multiple children.

Is there a way to get all descendants of node 12 without having to create a 
loop in PHP to find all children, then all children of each child, etc ?
If not, is it possible to create a recursive query directly in Solr, or is 
there a better way to index tree structures ?

Any help or suggestion would be highly appreciated.

Thank you in advance,

Semiaddict

Using CachedSqlEntityProcessor with delta imports in DIH

2013-09-23 Thread David Larochelle

I'm trying to use the CachedSqlEntityProcessor on a child entity that also
has a delta query.

Full imports and delta imports of the parent entity work fine however delta
imports for the child entity have no effect. If I remove the
processor="CachedSqlEntityProcessor" attribute from the child entity, the
delta import works flawlessly but the full import is very slow.
Here's my data-config.xml:



  http://www.w3.org/2001/XInclude"/>
  

  
  

  



I need to be able to run delta imports based on the media_tags_map table in
addition to the story_sentences table.

Any idea why delta imports for media_tags_map won't work when the
CachedSqlEntityProcessor is used?

I've searched extensively but can't find an example that uses both
CachedSqlEntityProcessor and deltaQuery on the sub-entity or any
explanation of why the above configuration won't work as expected.

--

Thanks,

David

Re: Searching for closed polylines that contain a given point

2013-09-23 Thread Smiley, David W.

Mark,

Yes you can.  You should index polygons, not polylines.  A polyline
semantically refers to the actual line but rather you want to index the
coverage of the nation (the space encircled by the polyline), not the
border literally.

One thing to be aware of is that indexed non-point shapes are pixelated to
a grid kinda like how a vector shape is actually drawn to a computer
screen which is composed of a matrix of pixels.  The approach will
hopefully scale ok for your requirements but if you want more detailed
precision along the borders then it might not -- if you set distErrPct too
low then you'll run out of memory while indexing.

To get started, see:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

~ David

On 9/23/13 5:21 PM, "Mark Backman"  wrote:

>
>
>I'm new to spatial search within solr.  If I have a set of records
>containing closed polylines describing, say, the boundaries of nations,
>can I use solr to build an index of these records against which I can
>search to see if a point is contained within any of them?
>
>
>Thanks,
>
>-Mark

Searching for closed polylines that contain a given point

2013-09-23 Thread Mark Backman



I'm new to spatial search within solr.  If I have a set of records containing 
closed polylines describing, say, the boundaries of nations, can I use solr to 
build an index of these records against which I can search to see if a point is 
contained within any of them?


Thanks,

-Mark

Re: Hash range to shard assignment

2013-09-23 Thread lochri

Yes, actually that would be a very comfortable solution.
Is that planned ? And if so, when will it be released ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091591.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrParams to and from NamedList

2013-09-23 Thread Peter Kirk

Hi,

In a request-handler, if I run the below code, I get an exception from Solr
undefined field: "[Ljava.lang.String;@41061b68"

It appears the conversion between SolrParams and NamedList and back again fails 
if one of the parameters is an array. This could be a couple of configuration 
parameters like
category
author


public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) 
throws Exception {

  SolrParams params = req.getParams();
  NamedList parameterList = params.toNamedList();
  SolrParams newSolrParams = SolrParams.toSolrParams(parameterList);

  req.setParams(newSolrParams);
  super.handleRequestBody(req, rsp);


How can I generate the correct conversion?

Thanks.

Indexing bulk loads of PDF files and extracting information from them

2013-09-23 Thread Sadika Amreen

Hi all,



I am looking to index the entire directory of PDF files. We have a very large 
volume of PDFs (3000+, possibly much more), so adding them manually would be 
cumbersome.



I have seen more than a couple of dozen links explaining how to index PDF using 
SOLR, but none were details enough to help me get started.

I understand that indexing a word or PDF document requires the use of the 
ExtractingRequestHandler which uses Apache Tika.



My question is:  How do I configure the Handler so that it can extract the 
required information from bulk loads of PDF?

I know I am asking a broad question, but I am struggling to find a good 
guidance and something that would give me a step to step approach.



There is an example configuration in the following link: 
http://wiki.apache.org/solr/ExtractingRequestHandler

I have also seen these threads:

http://stackoverflow.com/questions/5947157/index-search-pdf-content-with-solr

http://www.gossamer-threads.com/lists/lucene/general/158117



I am still trying to understand the configuration process, so any concrete help 
would be welcome.



Thanks,

Sadika Amreen

Data Scientist

PYA Analytics



DISCLOSURE



Any U.S. tax advice contained in the body of this email was not intended or 
written to be used, and cannot be used, by the recipient for the purpose of 
avoiding penalties that may be imposed under the Internal Revenue Code or 
applicable state or local tax provisions.



IMPORTANT NOTICE



This E-mail (including any attachments) contains PRIVILEGED AND CONFIDENTIAL 
INFORMATION protected by Federal and/or State law and is intended only for the 
use of the individual(s) or entity(ies) designated as recipient(s). If you are 
not an intended recipient of the E-mail, you are hereby notified that any 
disclosure, copying, distribution, or action taken in reliance on the contents 
of this E-mail is strictly prohibited. Disclosure to anyone other than the 
intended recipient does not constitute a waiver of any applicable privilege.

If you have received this E-mail in error, please immediately notify us by 
phone at (800) 270-9629 or reply to the sender of this email and then 
permanently delete the original and any copy of this E-mail (including any 
attachments) and destroy any printout thereof.

Solr query processing

2013-09-23 Thread Scott Smith

I just want to state a couple of things and hear someone say, "that's right".


1.   In a solr query you can have multiple fq's, but only a single q.  And 
yes, I can simply AND the multiple "q"s together.  Just want to avoid that if 
I'm wrong.

2.   A subtler issue is that when a full query is executied, Solr must look 
at the schema to see how each field was tokenized (or not) and the various 
other filters applied to a field so that it can properly transform fields data 
(e.g., tokenize the text, but not keywords).  As an aside, it would be nice if 
the queryparser could do the same thing in Lucene (I know, wrong forum :)).
Scott

How to sort over all documents by score after Result Grouping / Field Collapsing

2013-09-23 Thread go2jun

Hi, I have solr documents like this:

 
 
 

I know I can use solr Result Grouping / Field Collapsing to get the top 2
result by grouping by source_id. Within each groups, documents sorted by
scroe by query like this:
http://localhost:8983/solr/select?q=bank&group.field=source_id&group=true&group.limit=2&group.main=true&sort=score

My question is:

1. Is it possible to sort overall documents after I do above grouping?
2. Is there any other ways to implement above functions(by using solr
functions directly)?
3. Is it possible to implement this by writing java code something like
customized request handler to do this?

Thanks in advance,

Jun 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-over-all-documents-by-score-after-Result-Grouping-Field-Collapsing-tp4091593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Hash range to shard assignment

2013-09-23 Thread Noble Paul നോബിള്‍ नोब्ळ्

Custom routers is an idea that is floated around ad easy to implement.
Just that it is something we resist to add another extension point.

The point is we are planning other features which would obviate the
need for a custom router. Something like splitting a shard by a query.
Will it be a good enough solution for you?





On Mon, Sep 23, 2013 at 2:52 PM, lochri  wrote:
> Thanks for the clarification.
>
> Still I would think it is sub-optimal to split shards when we actually don't
> know which mailboxes we actually split. It may create splits of small users
> which leads to unnecessary distribution of the smaller users.
>
> We thought about doing the routing ourself. As far as a I understood we can
> do distributed searches across multiple collections. What do you think about
> this option ?
>
> For the ideal solution: when will custom routers be supported ?
>
> Regards,
> Lochri
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091503.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
-
Noble Paul

OpenNLP Analyzers not working properly

2013-09-23 Thread rashi gandhi

Hi,



iam working on OpenNLP with SOLR. I have successfully applied the patch
LUCENE-2899-x.patch to latest SOLR code branch_4x.

I desgined some analyers based on OpenNLP filters and tokenziers and index
some documnets on that fields.

Searching on OpenNLP field is not constant. Not able to search on these
OpenNLP designed fields in solr schema.xml properly.

Also, how to use payloads for boosting the document.


Please help me on this.

Re: Get only those documents that are fully satisfied.

2013-09-23 Thread asuka

Hi Jack,
   I've been working with the following schema field analyzer:



Regarding the query, the one I'm using right now is:



But with this query, I just get results by the presence of any of the words
at the sentence.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531p4091565.html
Sent from the Solr - User mailing list archive at Nabble.com.

Complez query combining fq and q with join

2013-09-23 Thread marotosg

Hi all,Thanks in advance for your help.I am trying to create a query joining
two cores using {!join}  functionality.I have two cores, "personcore" and
"personjobcore". *Person core schema*PersonIDGenderAge*Company core
schema*PersonJobIDPersonIDCompanyNameCompanyTypeAddressI have to create a
complex query like this one joining both cores and getting results only from
Person.*(Gender:Male AND Company:IBM AND CompanyType:All) OR (Gender:Female
AND Address:United States)*I am finding really hard to create this query
using {!join} as I have to define several join sentences and using boolean
operator within them.Is that possible?This an example what I am
tryinghttp://localhost:8080/solr4/person/select/?&q=Gender:Male AND {!join
from=PersonID to=PersonID fromIndex=personjob}((CoCompanyName:ibm) AND 
(CompanyType:All))



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Complez-query-combining-fq-and-q-with-join-tp4091563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet search on a docvalue field in a multi shard collection

2013-09-23 Thread Trym R. Møller


Hi

I have created https://issues.apache.org/jira/browse/SOLR-5260 as 
proposed by Erick.
I hope anyone working with doc values can lead me in a direction of how 
to solve the bug.


Best regards Trym

Den 23-09-2013 16:01, Erick Erickson skrev:

I haven't dived into the code, but it sure looks like a JIRA to me,
can you open one?

Best,
Erick

On Mon, Sep 23, 2013 at 1:48 AM, "Trym R. Møller"  wrote:

Hi Erick

Thanks for your input.

I have retrieved and build the branch
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_5
Doing the same setup as in my previous post (two shard collection, fieldA af
docValue type, index a single document and doing a facet search on fieldA),
I now get the below exception. The cause (which is not visible from the
stacktrace) is as previous: "Cannot use facet.mincount=0 on field fieldA
which is not indexed"

What could be my next steps from here?

620710 [qtp1728933440-15] ERROR org.apache.solr.core.SolrCore ▒
org.apache.solr.common.SolrException: Exception during facet.field: fieldA.
at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:569)
at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:554)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at org.apache.solr.request.SimpleFacets$1.execute(SimpleFacets.java:508)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:579)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:265)

at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)

at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)

at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:724)

Den 22-09-2013 17:09, Erick Erickson skrev:


right, I think you're running into a bug I remember going by. I can't
find it now, JIRA seems to be not responding. As I remember,
where if a shard doesn't have a doc on it, you get an error.

Although why facet.limit should figure in here is a mystery to me,
maybe a coincidence?

Significant work has been done about not requiring values for
DocValues fields and stuff. Can you give a try on 4.x or the
soon-to-be-released 4.5?

Best,
Erick

On Sun, Sep 22, 2013 at 6:26 AM, "Trym R. Møller"  wrote:

Hi

I have a problem doing facet search on a doc value field in a multi shard
collection. Any ideas what I may be doing wrong?

My Solr schema specifies fieldA as a docvalue type and I have created a
two
shard collection usin

Re: Facet search on a docvalue field in a multi shard collection

2013-09-23 Thread Erick Erickson

I haven't dived into the code, but it sure looks like a JIRA to me,
can you open one?

Best,
Erick

On Mon, Sep 23, 2013 at 1:48 AM, "Trym R. Møller"  wrote:
> Hi Erick
>
> Thanks for your input.
>
> I have retrieved and build the branch
> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_5
> Doing the same setup as in my previous post (two shard collection, fieldA af
> docValue type, index a single document and doing a facet search on fieldA),
> I now get the below exception. The cause (which is not visible from the
> stacktrace) is as previous: "Cannot use facet.mincount=0 on field fieldA
> which is not indexed"
>
> What could be my next steps from here?
>
> 620710 [qtp1728933440-15] ERROR org.apache.solr.core.SolrCore ▒
> org.apache.solr.common.SolrException: Exception during facet.field: fieldA.
> at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:569)
> at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:554)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at org.apache.solr.request.SimpleFacets$1.execute(SimpleFacets.java:508)
> at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:579)
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:265)
>
> at
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
>
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
> at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Thread.java:724)
>
> Den 22-09-2013 17:09, Erick Erickson skrev:
>
>> right, I think you're running into a bug I remember going by. I can't
>> find it now, JIRA seems to be not responding. As I remember,
>> where if a shard doesn't have a doc on it, you get an error.
>>
>> Although why facet.limit should figure in here is a mystery to me,
>> maybe a coincidence?
>>
>> Significant work has been done about not requiring values for
>> DocValues fields and stuff. Can you give a try on 4.x or the
>> soon-to-be-released 4.5?
>>
>> Best,
>> Erick
>>
>> On Sun, Sep 22, 2013 at 6:26 AM, "Trym R. Møller"  wrote:
>>>
>>> Hi
>>>
>>> I have a problem doing facet search on a doc value field in a multi shard
>>> collection. Any ideas what I may be doing wrong?
>>>
>>> My Solr schema specifies fieldA as a docvalue type and I have created a
>>> two
>>> shard collection using Solr

Re: solr - searching part of words

2013-09-23 Thread Jack Krupansky

Solr is very flexible and you can configure it in lots of amazing ways. You 
need to start with carefully specifying the rules  that you wish to 
implement. Is the numeric the boundary, or do you want to support arbitrary 
prefixes, or... what? Be specific, because that determines what features of 
Solr to use and precisely how to use them.


The word delimiter filter and edge n-gram filter are possible tools to use 
in such cases.


-- Jack Krupansky

-Original Message- 
From: Mysurf Mail

Sent: Monday, September 23, 2013 3:34 AM
To: solr-user@lucene.apache.org
Subject: solr - searching part of words

My field is defined as



*text_en is defined as in the original schema.xml that comes with solr

Now, my field has the following vaues

  - "one"
  - "one1"

searching for "one" returns only the field "one". What causes it? how can I
change it?

Re: Get only those documents that are fully satisfied.

2013-09-23 Thread Jack Krupansky

It all depends on your query parameters and schema field type analyzers, of 
which you have told us nothing.


-- Jack Krupansky

-Original Message- 
From: asuka

Sent: Monday, September 23, 2013 7:57 AM
To: solr-user@lucene.apache.org
Subject: Get only those documents that are fully satisfied.

Hi,
  I've got an scenario similar to the following one.


   ID1
   PAUL MCCARTNEY
   FLOWERS IN THE DIRT
   1989


   LP2
   ALICE IN CHAINS
   DIRT
   1992


   LP3
   GUNS'N'ROSES
   THE SPAGHETTI INCIDENT?
   1993


I can't picture how I can perform searches that give me, as a result, those
LP's  where all their NAME terms have been satisfied; for example, imagine I
search for "DIRT"; I would like to get only the doc with ID LP2, since at
LP1, we've got the words "FLOWERS IN THE" that haven't been included at the
query.

If the query is: "the dirt spaguetti incident?" I should get the docs LP2
and LP3.

Kind regards





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531.html
Sent from the Solr - User mailing list archive at Nabble.com.

[DIH] Logging skipped documents

2013-09-23 Thread jerome . dupont


Hello,

I have a question, I index documents and a small part them are skipped, (I
am in onError="skip" mode)
I'm trying to get a list of them, in order to analyse what's worng with
these documents
Is there a mean to get the list of skipped documents, and some more
information (my onError="skip" is in an XPathEntityProcessor, the name of
the file processed would be OK)


Cordialement,
---
Jérôme Dupont
Bibliothèque Nationale de France
Département des Systèmes d'Information
Tour T3 - Quai François Mauriac
75706 Paris Cedex 13
téléphone: 33 (0)1 53 79 45 40
e-mail: jerome.dup...@bnf.fr
---



Participez à la Grande Collecte 1914-1918 Avant d'imprimer, pensez à 
l'environnement.

Get only those documents that are fully satisfied.

2013-09-23 Thread asuka

Hi,
   I've got an scenario similar to the following one.


ID1
PAUL MCCARTNEY
FLOWERS IN THE DIRT
1989


LP2
ALICE IN CHAINS
DIRT
1992


LP3
GUNS'N'ROSES
THE SPAGHETTI INCIDENT?
1993


I can't picture how I can perform searches that give me, as a result, those
LP's  where all their NAME terms have been satisfied; for example, imagine I
search for "DIRT"; I would like to get only the doc with ID LP2, since at
LP1, we've got the words "FLOWERS IN THE" that haven't been included at the
query.

If the query is: "the dirt spaguetti incident?" I should get the docs LP2
and LP3.

Kind regards





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-those-documents-that-are-fully-satisfied-tp4091531.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problems with gaps removed with SynonymFilter

2013-09-23 Thread Michael McCandless

Unfortunately the current SynonymFilter cannot handle posInc != 1 ...
we could perhaps try to fix this ... patches welcome :)

So for now it's best to place SynonymFilter before StopFilter, and
before any other filters that may create graph tokens (posLen > 1,
posInc == 0).

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 23, 2013 at 2:45 AM,   wrote:
> Hi,
>
> I am having a problem applying StopFilterFactory and
> SynonimFilterFactory. The problem is that SynonymFilter removes the gaps
> that were previously put by the StopFilterFactory. I'm applying filters in
>
> query time, because users need to change synonym lists frequently.
>
> This is my schema, and an example of the issue:
>
>
> String: "documentacion para agentes"
>
> org.apache.solr.analysis.WhitespaceTokenizerFactory
> {luceneMatchVersion=LUCENE_35}
> position1   2   3
> term text   documentaciónpara   agentes
> startOffset 0   14  19
> endOffset   13  18  26
> org.apache.solr.analysis.LowerCaseFilterFactory
> {luceneMatchVersion=LUCENE_35}
> position1   2   3
> term text   documentaciónpara   agentes
> startOffset 0   14  19
> endOffset   13  18  26
> org.apache.solr.analysis.StopFilterFactory {words=stopwords_intranet.txt,
> ignoreCase=true, enablePositionIncrements=true,
> luceneMatchVersion=LUCENE_35}
> position1   3
> term text   documentación   agentes
> startOffset 0   19
> endOffset   13  26
> org.apache.solr.analysis.SynonymFilterFactory
> {synonyms=sinonimos_intranet.txt, expand=true, ignoreCase=true,
> luceneMatchVersion=LUCENE_35}
> position1   2
> term text   documentación   agente
> archivo agentes
> typeSYNONYM SYNONYM
> SYNONYM SYNONYM
> startOffset 0   19
> 0   19
> endOffset 1326
> 13  26
>
>
> As you can see, the position should be 1 and 3, but SynonymFilter removes
> the gap and moves token from position 3 to 2
> I've got the same problem with Solr 3.5 y 4.0.
> I don't know if it's a bug or an error with my configuration. In other
> schemas that I have worked with, I had always put the SynonymFilter
> previous to StopFilter, but in this I prefered using this order because of
>
> the big number of synonym that the list has (i.e. I don't want to generate
>
> a lot of synonyms for a word that I really wanted to remove).
>
> Thanks,
>
> David Dávila Atienza
> AEAT - Departamento de Informática Tributaria
>
> David Dávila Atienza
> AEAT - Departamento de Informática Tributaria
> Subdirección de Tecnologías de Análisis de la Información e Investigación
> del Fraude
> Área de Infraestructuras
> Teléfono: 915831543
> Extensión: 31543

[Solr Join Score Mode] Query parser lack

2013-09-23 Thread Alessandro Benedetti

Hi guys,
I was studying in deep the join feature, and I noticed that in Solr , the
join query parser is not working in scoring.
If you add the parameter "scoreMode" it is completely ignored...

Checking the source code it's possible to see that the join query is built
as follow :

public class JoinQParserPlugin extends QParserPlugin {



*JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery);*

And the JoinQuery object has no implementation regarding the score mode.

Looking in the lucene code we can find , in the class :

org.apache.lucene.search.join.JoinUtil a complete usage of the score mode param.

Why has not be ported ? Anyone is planning to change the JoinQParserPlugin ?

https://issues.apache.org/jira/browse/LUCENE-4043

looking here is possible to customize a query parser adding a wrapping
of the JoinUtil, but nothing official ?

Cheers


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: SolrCloud - Path must not end with / character

2013-09-23 Thread capesonlee

Hi guys, I just met this problem too. After read the source code, I found
collection1 is missing in the zookeeper configuration. You can solve this
problem by just remove the version-2 folder of zookeeper configration and
init the zookeeper again. Hope this helps.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Path-must-not-end-with-character-tp4087159p4091465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: deployee issu on solr

2013-09-23 Thread Furkan KAMACI

Could you send your error?


2013/9/23 Ramesh 

> Unable to deploying  solr 4.4 on JBoss -4.0.0  I am getting error like
>
>
>
>

deployee issu on solr

2013-09-23 Thread Ramesh

Unable to deploying  solr 4.4 on JBoss -4.0.0  I am getting error like

Re: Hash range to shard assignment

2013-09-23 Thread lochri

Thanks for the clarification.

Still I would think it is sub-optimal to split shards when we actually don't
know which mailboxes we actually split. It may create splits of small users
which leads to unnecessary distribution of the smaller users.

We thought about doing the routing ourself. As far as a I understood we can
do distributed searches across multiple collections. What do you think about
this option ?

For the ideal solution: when will custom routers be supported ?

Regards,
Lochri



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204p4091503.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet search on a docvalue field in a multi shard collection

2013-09-23 Thread Trym R. Møller


Hi Erick

Thanks for your input.

I have retrieved and build the branch 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_5
Doing the same setup as in my previous post (two shard collection, 
fieldA af docValue type, index a single document and doing a facet 
search on fieldA), I now get the below exception. The cause (which is 
not visible from the stacktrace) is as previous: "Cannot use 
facet.mincount=0 on field fieldA which is not indexed"


What could be my next steps from here?

620710 [qtp1728933440-15] ERROR org.apache.solr.core.SolrCore ▒ 
org.apache.solr.common.SolrException: Exception during facet.field: fieldA.

at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:569)
at org.apache.solr.request.SimpleFacets$2.call(SimpleFacets.java:554)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at org.apache.solr.request.SimpleFacets$1.execute(SimpleFacets.java:508)
at 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:579)
at 
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:265)
at 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:724)

Den 22-09-2013 17:09, Erick Erickson skrev:

right, I think you're running into a bug I remember going by. I can't
find it now, JIRA seems to be not responding. As I remember,
where if a shard doesn't have a doc on it, you get an error.

Although why facet.limit should figure in here is a mystery to me,
maybe a coincidence?

Significant work has been done about not requiring values for
DocValues fields and stuff. Can you give a try on 4.x or the
soon-to-be-released 4.5?

Best,
Erick

On Sun, Sep 22, 2013 at 6:26 AM, "Trym R. Møller"  wrote:

Hi

I have a problem doing facet search on a doc value field in a multi shard
collection. Any ideas what I may be doing wrong?

My Solr schema specifies fieldA as a docvalue type and I have created a two
shard collection using Solr 4.4.0.
When I do a facet search on fieldA with a "large" facet.limit then the query
fails with the below exception
A "large" facet.limit seems to be when (10 + (facet.limit * 1,5)) * number
of shards > rows matching my query

The exception does not occur when I run with a single shard collection.
It can easily be reproduced by indexing a single row and queryin

Re: isolating solrcloud instance from peer updates

2013-09-23 Thread Anshum Gupta

Though as Shalin said, there's no way to do it other than just taking it
off completely, can you specify the use case ?

On Sun, Sep 22, 2013 at 3:17 AM, Aditya Sakhuja wrote:

> Hello all,
>
> Is there a way to isolate an active solr-cloud instance from all incoming
> replication update requests from peer nodes ?
>
> --
> Regards,
> -Aditya Sakhuja
>

-- 

Anshum Gupta
http://www.anshumgupta.net

Re: Storing/indexing speed drops quickly

2013-09-23 Thread Per Steffensen

Now running the tests on a slightly reduced setup (2 machines, quadcore, 
8GB ram ...), but that doesnt matter


We see that storing/indexing speed drops when using 
IndexWriter.updateDocument in DirectUpdateHandler2.addDoc. But it does 
not drop when just using IndexWriter.addDocument (update-requests with 
overwrite=false)
Using addDocument: 
https://dl.dropboxusercontent.com/u/25718039/AddDocument_2Solr8GB_DocCount.png
Using updateDocument: 
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png
We are not too happy about having to use addDocument, because that 
allows for duplicates, and we would really want to avoid that (on 
Solr/Lucene level)


We have confirmed that doubling amount of total RAM will double the 
amount of documents in the index where the indexing-speed starts 
dropping (when we use updateDocument)
On 
https://dl.dropboxusercontent.com/u/25718039/UpdateDocument_2Solr8GB_DocCount.png 
you can see that the speed drops at around 120M documents. Running the 
same test, but with Solr machine having 16GB RAM (instead of 8GB) the 
speed drops at around 240M documents.


Any comments on why indexing speed drops with IndexWriter.updateDocument 
but not with IndexWriter.addDocument?


Regards, Per Steffensen

On 9/12/13 10:14 AM, Per Steffensen wrote:

Seems like the attachments didnt make it through to this mailing list

https://dl.dropboxusercontent.com/u/25718039/doccount.png
https://dl.dropboxusercontent.com/u/25718039/iowait.png


On 9/12/13 8:25 AM, Per Steffensen wrote:

Hi

SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node 
on each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread 
one doc at the time, full speed (they always have a new doc to 
store/index)

See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection

Starting from an empty collection. Things are fine wrt 
storing/indexing speed for the first two-three hours (100M docs per 
hour), then speed goes down dramatically, to an, for us, unacceptable 
level (max 10M per hour). At the same time as speed goes down, we see 
that I/O wait increases dramatically. I am not 100% sure, but quick 
investigation has shown that this is due to almost constant merging.


What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but 
earlier tests shows that this really do not seem to do the job - it 
might postpone the time where the problem occurs, but basically it is 
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a 
high level, while still making sure that searches will perform fairly 
well when data-amounts become big? (guess without merging you will 
end up with lots and lots of "small" files, and I guess this is not 
good for search response-time)


Regards, Per Steffensen

Re: import partition table from oracle

2013-09-23 Thread YouPeng Yang

Hi  Shalin

 Thanks a lot. It is the point that I need.


Regards


2013/9/23 Shalin Shekhar Mangar 

> You can use request parameters in your query e.g.
>
> 
>
> http://wiki.apache.org/solr/DataImportHandler#Accessing_request_parameters
>
> On Mon, Sep 23, 2013 at 8:26 AM, YouPeng Yang 
> wrote:
> > Hi
> >
> >   I want to import dataset in a partition of a partition table with DIH.
> > And I would like to explicitly define the partition when I do import job.
> >
> >  To be specific.
> >   1. I define the DIH configuration like these
> > 
> >
> >   2.I send the url:
> >   http://localhost:8983/solr/dataimport?command=full-import&part=p2
> >
> >   and then the DIHhandler will full import the p2 partition of the table.
> >
> > Any suggestion will be appreciated.
> >
> >
> > Regards.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

solr - searching part of words

2013-09-23 Thread Mysurf Mail

My field is defined as



*text_en is defined as in the original schema.xml that comes with solr

Now, my field has the following vaues

   - "one"
   - "one1"

searching for "one" returns only the field "one". What causes it? how can I
change it?

Re: How to define facet.prefix as case-insensitive

2013-09-23 Thread Mysurf Mail

thanks.


On Sun, Sep 22, 2013 at 6:24 PM, Erick Erickson wrote:

> You'll have to lowercase the term in your app and set
> terms.prefix to that value, there's no analysis done
> on the terms.prefix value.
>
> Best,
> Erick
>
> On Sun, Sep 22, 2013 at 4:07 AM, Mysurf Mail 
> wrote:
> > I am using facet.prefix for auto complete.
> > This is my definition
> >
> >  
> >  
> >   explicit
> >   ...
> >   true
> >   on
> >   Suggest
> > 
> >
> > this is my field
> >
> >  > required="false" multiValued="true"/>
> >
> > and
> >
> >  
> >   
> > 
> > 
> >   
> > 
> >
> > all works fine but when I search using caps lock it doesn't return
> answers.
> > Even when the field contains capitals letters - it doesn't.
> >
> > I assume that the field in solr is lowered (from the field type filter
> > definition) but the search term is not.
> > How can I control the search term caps/no caps?
> >
> > Thanks.
>

Re: Spellchecking

2013-09-23 Thread Gastone Penzo

Thank you!!




2013/9/20 Dyer, James 

> If you're using "spellcheck.collate" you can also set
> "spellcheck.maxCollationTries" to validate each collation against the index
> before suggesting it.  This validation takes into account any "fq"
> parameters on your query, so if your original query has "fq=Product:Book",
> then the collations returned will all be vetted by internally running the
> query with that filter applied.
>
> If for some reason your main query does not have "fq=Product:Book", but
> you want it considered when collations are being built, you can include
> "spellcheck.collateParam.fq=Product:Book".
>
> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateand 
> following sections.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
> Sent: Friday, September 20, 2013 4:00 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking
>
> Hi,
> i'd like to know if is it possibile to have suggests only of a part of
> indexes.
> for example:
>
> an ecommerce:
> there are a lot of typologies of products (book, dvd, cd..)
>
> if i search inside books, i want only suggests of books products, not cds
> but the spellchecking indexs are all together.
>
> is it possibile to divided indexes or have suggests only of a typology?
>
> thanx
>
> --
> Gastone
>
>


-- 
*Gastone Penzo*
*
*

39 matches

Mail list logo