Re: Xml file is not inserting from code java -jar post.jar *.xml

2013-09-27 Thread Kishan Parmar
i  have done indexing .
now i can search throiu query
but

how i connect my solr to solrnet ..i  have downloaded the dl;l file of
solrnet
but i dont know ehere i should put it 
and steps for installation ...plz..

and i am usind visual studio 10 for my .net work

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Thu, Sep 26, 2013 at 5:27 PM, Erick Erickson wrote:

> Solr does not index arbitrary XML, it only indexes
> XML in a very specific format. You haven't
> shown an example of what you're trying to index.
>
> See the examples in example/exempledocs for the
> format required.
>
> Best,
> Erick
>
> On Thu, Sep 26, 2013 at 8:32 AM, Furkan KAMACI 
> wrote:
> > You should start to read from here:
> > http://lucene.apache.org/solr/4_4_0/tutorial.html
> >
> >
> > 2013/9/26 Kishan Parmar 
> >
> >>
> >>
> http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html
> >> \
> >>
> >> this is the link from where i fown the solr installation
> >>
> >> Regards,
> >>
> >> Kishan Parmar
> >> Software Developer
> >> +91 95 100 77394
> >> Jay Shree Krishnaa !!
> >>
> >>
> >>
> >> On Thu, Sep 26, 2013 at 1:13 PM, Kishan Parmar 
> >> wrote:
> >>
> >> > i am not using tomcat  but i am using alwaysup software to run the
> solr
> >> > system.
> >> > it is working perfectly
> >> >
> >> > but  i can not add my xml file to index..i channged my schema file as
> per
> >> > requirement of my xml file ...
> >> > and also
> >> > i am using this command to insert xml to index
> >> >
> >> >
> >> > java -Durl=http://localhost:8983/solr/core0/update -jar post.jar
> *.xml
> >> >
> >> > ibut it gives an error  and if i write java -jar post.jar *.xml then
> it
> >> > index the data but in anoter core collection1
> >> > and
> >> > there is an error also in it that "no dataimport handler is found";
> >> > so what can i do for this problems
> >> >
> >> > Regards,
> >> >
> >> > Kishan Parmar
> >> > Software Developer
> >> > +91 95 100 77394
> >> > Jay Shree Krishnaa !!
> >> >
> >> >
> >> >
> >> > On Sun, Sep 22, 2013 at 8:53 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >wrote:
> >> >
> >> >> Please review:
> >> >>
> >> >> http://wiki.apache.org/solr/UsingMailingLists
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Sun, Sep 22, 2013 at 8:06 AM, Jack Krupansky <
> >> j...@basetechnology.com>
> >> >> wrote:
> >> >> > Did you start Solr? How did you verify that Solr is running? Are
> you
> >> >> able to
> >> >> > query Solr and access the Admin UI?
> >> >> >
> >> >> > Most importantly, did you successfully complete the standard Solr
> >> >> tutorial?
> >> >> > (IOW, you know all the necessarily steps for basic operation of
> Solr.)
> >> >> >
> >> >> > Lastly, did you verify (by examining the log) whether Solr was
> able to
> >> >> > successfully load your schema changes without errors?
> >> >> >
> >> >> > -- Jack Krupansky
> >> >> >
> >> >> > -Original Message- From: Kishan Parmar
> >> >> > Sent: Sunday, September 22, 2013 9:56 AM
> >> >> > To: solr-user@lucene.apache.org
> >> >> > Subject: Xml file is not inserting from code java -jar post.jar
> *.xml
> >> >> >
> >> >> >
> >> >> > hi
> >> >> >
> >> >> > i am new user of Solr i have done my schema file and when i write a
> >> >> code to
> >> >> > insert xxl file to index from cmd .java -jar post.jar *.xml
> >> >> >
> >> >> > it give us error solr returned errer 404 not found
> >> >> >
> >> >> > what can i do???
> >> >> >
> >> >> >
> >> >> > Regards,
> >> >> >
> >> >> > Kishan Parmar
> >> >> > Software Developer
> >> >> > +91 95 100 77394
> >> >> > Jay Shree Krishnaa !!
> >> >>
> >> >
> >> >
> >>
>


Re: Hello and help :)

2013-09-27 Thread Upayavira
To phrase your need more generically:

 * find all documents for userID=x, where userID=x has more than y
 documents in the index

Is that correct?

If it is, I'd probably do some work at index time. First guess, I'd keep
a separate core, which has a very small document per user, storing just:

 * userID
 * docCount

Then, when you add/delete a document, you use atomic updates to either
increase or decrease the docCount on that user doc.

Then you can use a pseudo join between these two cores relatively
easily.

q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x
+doc_count:[y TO *]

Worst case, if you don't want to mess with your indexing code, I wonder
if you could use a ScriptUpdateProcessor to do this work - not sure if
you can have one add an entirely new, additional, document to the list,
but may be possible.

Upayavira

On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote:
> Sure, sorry for the inconvenience.
> 
> I'm having a little trouble trying to make a query in Solr. The problem
> is:
> I must be able retrieve documents that have the same value for a
> specified
> field, but they should only be retrieved if this value appeared more than
> X
> times for a specified user. In pseudosql it would be something like:
> 
> select user_id from documents
> where my_field="my_value"
> and
> (select count(*) from documents where my_field="my_value" and
> user_id=super.user_id) > X
> 
> I Know that solr return a 'numFound' for each query you make, but I dont
> know how to retrieve this value in a subquery.
> 
> My Solr is organized in a way that a user is a document, and the
> properties
> of the user (such as name, age, etc) are grouped in another document with
> a
> 'root_id' field. So lets suppose the following query that gets all the
> root
> documents whose children have the prefix "some_prefix".
> 
> is_root:true AND _query_:"{!join from=root_id
> to=id}requests_prefix:\"some_prefix\""
> 
> Now, how can I get the root documents (users in some sense) that have
> more
> than X children matching 'requests_prefix:"some_prefix"' or any other
> condition? Is it possible?
> 
> P.S. It must be done in a single query, fields can be added at will, but
> the root/children structure should be preserved (preferentially).
> 
> 
> 2013/9/27 Upayavira 
> 
> > Mattheus,
> >
> > Given these mails form a part of an archive that are themselves
> > self-contained, can you please post your actual question here? You're
> > more likely to get answers that way.
> >
> > Thanks, Upayavira
> >
> > On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
> > > Hello everyone,
> > > I'm having a problem regarding how to make a solr query, I've posted it
> > > on
> > > stackoverflow.
> > > Can someone help me?
> > >
> > http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter
> > >
> > > Thanks in advance!
> > >
> > > --
> > > --
> > >  // Matheus Salvia
> > > Desenvolvedor Mobile
> > > Celular: +55 11 9-6446-2332
> > > Skype: meta.faraday
> >
> 
> 
> 
> -- 
> --
>  // Matheus Salvia
> Desenvolvedor Mobile
> Celular: +55 11 9-6446-2332
> Skype: meta.faraday


Issue in parallel Indexing using multiple csv files

2013-09-27 Thread zaheer.java
Using SOLR 4.4

I'm trying to index solr core using a csv file of around 1 million records.
To improve the performance, I've split the csv files into smaller sizes and
tried to  use csv update handler for each file to run in a separate thread.
The outcome was weird. The total count of Solr Documents doesn't match with
the total number of records in the csv files. But, when I run these in
sequential manner, the outcome is as expected.

So, the question is if it is a good option to run these csv files in
parallel? Does it even work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-in-parallel-Indexing-using-multiple-csv-files-tp4092452.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Alexandre Rafalovitch
This is a rather complicated example to chew through, but try the following
two things:
*) dataField="${tika.text}"  => dataField="text" (or less likely htmlMapper
tika.text)
You might be trying to read content of the field rather than passing
reference to the field that seems to be expected. This might explain the
exception.

*) It may help to be aware of
https://issues.apache.org/jira/browse/SOLR-4530 . There is a new
htmlMapper="identity" flag on Tika entries to ensure more of HTML structure
passing through. By default, Tika strips out most of the HTML tags.

Regards,
   Alex.

On Thu, Sep 26, 2013 at 5:17 PM, Andreas Owen  wrote:

>  url="${rec.urlParse}" dataSource="dataUrl" onError="skip" format="html">
> 
>
>  forEach="/html" dataSource="fld" dataField="${tika.text}" rootEntity="true"
> onError="skip">
> 
> 
> 
>



Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Also, try the #solr and #solr-dev IRC channels at Freenode 
http://webchat.freenode.net/
Thanks,
— Socratees.

> From: ss...@outlook.com
> To: solr-user@lucene.apache.org
> Subject: RE: Hello and help :)
> Date: Fri, 27 Sep 2013 17:23:28 -0700
> 
> Hi Marcelo,
> I haven't faced this exact situation before so I can only try posting my 
> thoughts.
> Since Solr allows Result Grouping and Faceting at the same time, and since 
> you can apply filters on these facets, can you take advantage of that?
> Or, What if you can facet by the field, and group by the field count, then 
> apply facet filtering to exclude all filters with count less than 5? 
> These links might be helpful.
> http://architects.dzone.com/articles/facet-over-same-field-multiple
> https://issues.apache.org/jira/browse/SOLR-2898
> Thanks,
> — Socratees.
> 
> > Date: Fri, 27 Sep 2013 20:32:22 -0300
> > Subject: Re: Hello and help :)
> > From: marc...@s1mbi0se.com.br
> > To: solr-user@lucene.apache.org
> > 
> > Ssami,
> > 
> > I work with Matheus and I am helping him to take a look at this
> > problem. We took a look at result grouping, thinking it could help us, but
> > it has two drawbacks:
> > 
> >- We cannot have multivalued fields, if I understood it correctly. But
> >ok, we could manage that...
> >- Suppose some query like that:
> >   - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER > 5
> >   - In this case, we are not just taking the count for each group as a
> >   result. The count actually makes part of the where clause.
> >   - AFAIK, result grouping doesn't allow that, although I would really
> >   love to be proven wrong :D
> > 
> > We really need this, so I am trying to figure what could I change in
> > solr to make this work... Any hint on that? We would need to write a custom
> > facet / search handler / search component ? Of course we prefer a solution
> > that works with current solr features, but we could consider writing some
> > custom code to do that
> > 
> > Thanks in advance!
> > 
> > Best regards,
> > Marcelo Valle.
> > 
> > 
> > 2013/9/27 ssami 
> > 
> > > If I understand your question right, Result Grouping in Solr might help
> > > you.
> > >
> > > Refer  here
> > >   .
> > >
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> 
  

RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Hi Marcelo,
I haven't faced this exact situation before so I can only try posting my 
thoughts.
Since Solr allows Result Grouping and Faceting at the same time, and since you 
can apply filters on these facets, can you take advantage of that?
Or, What if you can facet by the field, and group by the field count, then 
apply facet filtering to exclude all filters with count less than 5? 
These links might be helpful.
http://architects.dzone.com/articles/facet-over-same-field-multiple
https://issues.apache.org/jira/browse/SOLR-2898
Thanks,
— Socratees.

> Date: Fri, 27 Sep 2013 20:32:22 -0300
> Subject: Re: Hello and help :)
> From: marc...@s1mbi0se.com.br
> To: solr-user@lucene.apache.org
> 
> Ssami,
> 
> I work with Matheus and I am helping him to take a look at this
> problem. We took a look at result grouping, thinking it could help us, but
> it has two drawbacks:
> 
>- We cannot have multivalued fields, if I understood it correctly. But
>ok, we could manage that...
>- Suppose some query like that:
>   - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER > 5
>   - In this case, we are not just taking the count for each group as a
>   result. The count actually makes part of the where clause.
>   - AFAIK, result grouping doesn't allow that, although I would really
>   love to be proven wrong :D
> 
> We really need this, so I am trying to figure what could I change in
> solr to make this work... Any hint on that? We would need to write a custom
> facet / search handler / search component ? Of course we prefer a solution
> that works with current solr features, but we could consider writing some
> custom code to do that
> 
> Thanks in advance!
> 
> Best regards,
> Marcelo Valle.
> 
> 
> 2013/9/27 ssami 
> 
> > If I understand your question right, Result Grouping in Solr might help
> > you.
> >
> > Refer  here
> >   .
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
  

Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread Ing. Jorge Luis Betancourt Gonzalez
Actually I don't use that field, it could be used to do some form of basic 
collaborative filtering, so you could use a high value for items in your 
collection that you want to come first, but in my case this was not a 
requirement and I don't use it at all.

- Mensaje original -
De: "JMill" 
Para: solr-user@lucene.apache.org
Enviados: Viernes, 27 de Septiembre 2013 16:19:40
Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple columns)

I am not sure about the value to use for the option "popularity".  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Jueves, 26 de Septiembre 2013 9:10:49
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> solved.
>
>
> On Thu, Sep 26, 2013 at 1:50 PM, JMill 
wrote:
>
>> I managed to get rid of the query error by playing jquery file in the
>> velocity folder and adding line: ">
src="#{url_for_solr}/admin/file?file=/velocity/jquery.min.js&contentType=text/javascript">".
>> That has not solved the issues the console is showing a new error -
>> "[13:42:55.181] TypeError: $.browser is undefined @
>>
http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.js&contentType=text/javascript:90
".
>> Any ideas?
>>
>>
>> On Thu, Sep 26, 2013 at 1:12 PM, JMill wrote:
>>
>>> Do you know the directory the "#{url_root}" in >> type="text/javascript" src="#{url_root}/js/lib/
>>> jquery-1.7.2.min.js"> points too? and same for
>>> ""#{url_for_solr}" >> src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>>
>>>
>>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>>> jlbetanco...@uci.cu> wrote:
>>>
 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could
change
 it the correct jquery version on solr 4.4, if you go into the admin
panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js)
 gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
 recent version.

 Perhaps you could change that part to something like:

   >>> src="#{url_root}/js/lib/jquery-1.7.2.min.js">

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest
on
 the top of this mail, in that case you should be able to see the
possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: "JMill" 
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
 to
 the jquery library but I can't seem to find the directory referenced,
  line:  

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
ok i see what your getting at but why doesn't the following work:




i removed the tiki-processor. what am i missing, i haven't found anything in 
the wiki?


On 28. Sep 2013, at 12:28 AM, P Williams wrote:

> I spent some more time thinking about this.  Do you really need to use the
> TikaEntityProcessor?  It doesn't offer anything new to the document you are
> building that couldn't be accomplished by the XPathEntityProcessor alone
> from what I can tell.
> 
> I also tried to get the Advanced
> Parsingexample to
> work without success.  There are some obvious typos (
> instead of ) and an odd order to the pieces ( is
> enclosed by ).  It also looks like
> FieldStreamDataSourceis
> the one that is meant to work in this context. If Koji is still around
> maybe he could offer some help?  Otherwise this bit of erroneous
> instruction should probably be removed from the wiki.
> 
> Cheers,
> Tricia
> 
> $ svn diff
> Index:
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> ===
> ---
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> (revision 1526990)
> +++
> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
> (working copy)
> @@ -99,13 +99,13 @@
> runFullImport(getConfigHTML("identity"));
> assertQ(req("*:*"), testsHTMLIdentity);
>   }
> -
> +
>   private String getConfigHTML(String htmlMapper) {
> return
> "" +
> "  " +
> "  " +
> -" processor='TikaEntityProcessor' " +
> +" processor='TikaEntityProcessor' " +
> "   url='" +
> getFile("dihextras/structured.html").getAbsolutePath() + "' " +
> ((htmlMapper == null) ? "" : (" htmlMapper='" + htmlMapper +
> "'")) + ">" +
> "  " +
> @@ -114,4 +114,36 @@
> "";
> 
>   }
> +  private String[] testsHTMLH1 = {
> +  "//*[@numFound='1']"
> +  , "//str[@name='h1'][contains(.,'H1 Header')]"
> +  };
> +
> +  @Test
> +  public void testTikaHTMLMapperSubEntity() throws Exception {
> +runFullImport(getConfigSubEntity("identity"));
> +assertQ(req("*:*"), testsHTMLH1);
> +  }
> +
> +  private String getConfigSubEntity(String htmlMapper) {
> +return
> +"" +
> +"" +
> +"" +
> +"" +
> +" dataSource='bin' format='html' rootEntity='false'>" +
> +"" +
> +"" +
> +"" +
> +"" +
> +"" +
> +" dataSource='fld' dataField='tika.text' rootEntity='true' >" +
> +"" +
> +"" +
> +"" +
> +"" +
> +"" +
> +"";
> +  }
> +
> }
> Index:
> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
> ===
> ---
> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
>   (revision 1526990)
> +++
> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
>   (working copy)
> @@ -194,6 +194,8 @@
>
>
>
> +   
> +   
> 
>  
>  
> 
> 
> I find the SqlEntityProcessor part particularly odd.  That's the default
> right?:
> 2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed
> 'null' java.lang.RuntimeException: unsupported type : class java.lang.String
> at
> org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89)
> at
> org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:1)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
> at
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495)
> at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
> at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
> at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
> at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
> at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.

Re: Hello and help :)

2013-09-27 Thread Marcelo Elias Del Valle
Ssami,

I work with Matheus and I am helping him to take a look at this
problem. We took a look at result grouping, thinking it could help us, but
it has two drawbacks:

   - We cannot have multivalued fields, if I understood it correctly. But
   ok, we could manage that...
   - Suppose some query like that:
  - select count(*) NUMBER group by FIELD where CONDITION AND NUMBER > 5
  - In this case, we are not just taking the count for each group as a
  result. The count actually makes part of the where clause.
  - AFAIK, result grouping doesn't allow that, although I would really
  love to be proven wrong :D

We really need this, so I am trying to figure what could I change in
solr to make this work... Any hint on that? We would need to write a custom
facet / search handler / search component ? Of course we prefer a solution
that works with current solr features, but we could consider writing some
custom code to do that

Thanks in advance!

Best regards,
Marcelo Valle.


2013/9/27 ssami 

> If I understand your question right, Result Grouping in Solr might help
> you.
>
> Refer  here
>   .
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Hello and help :)

2013-09-27 Thread Socratees Samipillai
Sorry, I take it back. I overlooked that you have two different collections. 
Thanks,
— Socratees.

> Date: Fri, 27 Sep 2013 20:03:46 -0300
> Subject: Re: Hello and help :)
> From: matheus2...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Yes, but how to use result grouping inside a join/subquery?
> 
> 
> 2013/9/27 ssami 
> 
> > If I understand your question right, Result Grouping in Solr might help
> > you.
> >
> > Refer  here
> >   .
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> 
> 
> 
> -- 
> --
>  // Matheus Salvia
> Desenvolvedor Mobile
> Celular: +55 11 9-6446-2332
> Skype: meta.faraday
  

Re: Hello and help :)

2013-09-27 Thread Matheus Salvia
Yes, but how to use result grouping inside a join/subquery?


2013/9/27 ssami 

> If I understand your question right, Result Grouping in Solr might help
> you.
>
> Refer  here
>   .
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
--
 // Matheus Salvia
Desenvolvedor Mobile
Celular: +55 11 9-6446-2332
Skype: meta.faraday


Re: Hello and help :)

2013-09-27 Thread ssami
If I understand your question right, Result Grouping in Solr might help you.

Refer  here
  .





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread P Williams
I spent some more time thinking about this.  Do you really need to use the
TikaEntityProcessor?  It doesn't offer anything new to the document you are
building that couldn't be accomplished by the XPathEntityProcessor alone
from what I can tell.

I also tried to get the Advanced
Parsingexample to
work without success.  There are some obvious typos (
instead of ) and an odd order to the pieces ( is
enclosed by ).  It also looks like
FieldStreamDataSourceis
the one that is meant to work in this context. If Koji is still around
maybe he could offer some help?  Otherwise this bit of erroneous
instruction should probably be removed from the wiki.

Cheers,
Tricia

$ svn diff
Index:
solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
===
---
solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 (revision 1526990)
+++
solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 (working copy)
@@ -99,13 +99,13 @@
 runFullImport(getConfigHTML("identity"));
 assertQ(req("*:*"), testsHTMLIdentity);
   }
-
+
   private String getConfigHTML(String htmlMapper) {
 return
 "" +
 "  " +
 "  " +
-"" +
 "  " +
@@ -114,4 +114,36 @@
 "";

   }
+  private String[] testsHTMLH1 = {
+  "//*[@numFound='1']"
+  , "//str[@name='h1'][contains(.,'H1 Header')]"
+  };
+
+  @Test
+  public void testTikaHTMLMapperSubEntity() throws Exception {
+runFullImport(getConfigSubEntity("identity"));
+assertQ(req("*:*"), testsHTMLH1);
+  }
+
+  private String getConfigSubEntity(String htmlMapper) {
+return
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"" +
+"";
+  }
+
 }
Index:
solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
===
---
solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
   (revision 1526990)
+++
solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml
   (working copy)
@@ -194,6 +194,8 @@



+   
+   

  
  


I find the SqlEntityProcessor part particularly odd.  That's the default
right?:
2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed
'null' java.lang.RuntimeException: unsupported type : class java.lang.String
at
org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89)
 at
org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:1)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323)
 at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
 at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
 at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
 at org.apache.solr.util.TestHarness.query(TestHarness.java:291)
at
org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase.runFullImport(AbstractDataImportHandlerTestCase.java:96)
 at
org.apache.solr.handler.dataimport.TestTikaEntityProcessor.testTikaHTMLMapperSubEntity(TestTikaEntityProcessor.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at
com.carrotsearch.randomizedtesting.Randomize

Re: Solr doesn't return TermVectors

2013-09-27 Thread Shawn Heisey

On 9/27/2013 4:02 PM, Jack Krupansky wrote:

You are using "components" instead of "last-components", so you have to
all search components, including the QueryComponent. Better to use
"last-components".


That did it.  Thank you!  I didn't know why this was a problem even with 
your note, until I read the last part of this page, which says that 
using "components" will entirely replace the default component list with 
what you specify:


http://wiki.apache.org/solr/SearchComponent

I copied and modified the handler from one I've already got that's using 
TermsComponent, which was using components instead of last-components. 
That handler works, so I figured it would for /tv as well. :)


Thanks,
Shawn



Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky
You are using "components" instead of "last-components", so you have to all 
search components, including the QueryComponent. Better to use 
"last-components".


-- Jack Krupansky

-Original Message- 
From: Shawn Heisey

Sent: Friday, September 27, 2013 4:02 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr doesn't return TermVectors

On 9/27/2013 1:35 PM, Jack Krupansky wrote:

You forgot the qt= parameter, such as on the wiki:

http://localhost:8983/solr/select/?&qt=tvrh&q=includes:[* TO *]&fl=id

And you need the custom request handler, such as on the wiki:



true


tvComponent



You can add that "last-components" list to your default handler, if you
wish.

I have more detailed examples in my e-book.


That wiki page probably needs to be updated to have a /tvrh handler
instead of tvrh, and with /tvrh instead of /select.  The 'qt' route is
the old way of doing things, before handleSelect="false" became the
accepted best practice.

In order to help the sender, I've been trying to get this working on my
dev server (running 4.4.0) and keep running into NullPointerException
problems.  I think there's something important that I'm missing about
how to use the component.  Here's an example of what my URL and request
handler are using:

http://server:port/solr/core/tv?q=id:someId&tv.fl=catchall


  
true
  
  
tvComponent
  


java.lang.NullPointerException at
org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:251)

Thanks,
Shawn 



Re: Hello and help :)

2013-09-27 Thread Matheus Salvia
Sure, sorry for the inconvenience.

I'm having a little trouble trying to make a query in Solr. The problem is:
I must be able retrieve documents that have the same value for a specified
field, but they should only be retrieved if this value appeared more than X
times for a specified user. In pseudosql it would be something like:

select user_id from documents
where my_field="my_value"
and
(select count(*) from documents where my_field="my_value" and
user_id=super.user_id) > X

I Know that solr return a 'numFound' for each query you make, but I dont
know how to retrieve this value in a subquery.

My Solr is organized in a way that a user is a document, and the properties
of the user (such as name, age, etc) are grouped in another document with a
'root_id' field. So lets suppose the following query that gets all the root
documents whose children have the prefix "some_prefix".

is_root:true AND _query_:"{!join from=root_id
to=id}requests_prefix:\"some_prefix\""

Now, how can I get the root documents (users in some sense) that have more
than X children matching 'requests_prefix:"some_prefix"' or any other
condition? Is it possible?

P.S. It must be done in a single query, fields can be added at will, but
the root/children structure should be preserved (preferentially).


2013/9/27 Upayavira 

> Mattheus,
>
> Given these mails form a part of an archive that are themselves
> self-contained, can you please post your actual question here? You're
> more likely to get answers that way.
>
> Thanks, Upayavira
>
> On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
> > Hello everyone,
> > I'm having a problem regarding how to make a solr query, I've posted it
> > on
> > stackoverflow.
> > Can someone help me?
> >
> http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter
> >
> > Thanks in advance!
> >
> > --
> > --
> >  // Matheus Salvia
> > Desenvolvedor Mobile
> > Celular: +55 11 9-6446-2332
> > Skype: meta.faraday
>



-- 
--
 // Matheus Salvia
Desenvolvedor Mobile
Celular: +55 11 9-6446-2332
Skype: meta.faraday


Re: Cross index join query performance

2013-09-27 Thread Peter Keegan
Hi Joel,

I tried this patch and it is quite a bit faster. Using the same query on a
larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin'
QTime was 100 msec! This was for true for large and small result sets.

A few notes: the patch didn't compile with 4.3 because of the
SolrCore.getLatestSchema call (which I worked around), and the package name
should be:


Unfortunately, I just learned that our uniqueKey may have to be an
alphanumeric string instead of an int, so I'm not out of the woods yet.

Good stuff - thanks.

Peter


On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein  wrote:

> It looks like you are using int join keys so you may want to check out
> SOLR-4787, specifically the hjoin and bjoin.
>
> These perform well when you have a large number of results from the
> fromIndex. If you have a small number of results in the fromIndex the
> standard join will be faster.
>
>
> On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan  >wrote:
>
> > I forgot to mention - this is Solr 4.3
> >
> > Peter
> >
> >
> >
> > On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan  > >wrote:
> >
> > > I'm doing a cross-core join query and the join query is 30X slower than
> > > each of the 2 individual queries. Here are the queries:
> > >
> > > Main query: http://localhost:8983/solr/mainindex/select?q=title:java
> > > QTime: 5 msec
> > > hit count: 1000
> > >
> > > Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1 TO
> > 0.3]
> > > QTime: 4 msec
> > > hit count: 25K
> > >
> > > Join query:
> > >
> >
> http://localhost:8983/solr/mainindex/select?q=title:java&fq={!joinfromIndex=mainindextoIndex=subindexfrom=docid
>  to=docid}fld1:[0.1 TO 0.3]
> > > QTime: 160 msec
> > > hit count: 205
> > >
> > > Here are the index spec's:
> > >
> > > mainindex size: 117K docs, 1 segment
> > > mainindex schema:
> > > > > required="true" multiValued="false" />
> > > > > stored="true" multiValued="false" />
> > >docid
> > >
> > > subindex size: 117K docs, 1 segment
> > > subindex schema:
> > > > > required="true" multiValued="false" />
> > > > > required="false" multiValued="false" />
> > >docid
> > >
> > > With debugQuery=true I see:
> > >   "debug":{
> > > "join":{
> > >   "{!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO
> 0.3]":{
> > > "time":155,
> > > "fromSetSize":24742,
> > > "toSetSize":24742,
> > > "fromTermCount":117810,
> > > "fromTermTotalDf":117810,
> > > "fromTermDirectCount":117810,
> > > "fromTermHits":24742,
> > > "fromTermHitsTotalDf":24742,
> > > "toTermHits":24742,
> > > "toTermHitsTotalDf":24742,
> > > "toTermDirectCount":24627,
> > > "smallSetsDeferred":115,
> > > "toSetDocsAdded":24742}},
> > >
> > > Via profiler and debugger, I see 150 msec spent in the outer
> > > 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This seems
> > like a
> > > lot of time to join the bitsets. Does this seem right?
> > >
> > > Peter
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>


Re: Hello and help :)

2013-09-27 Thread Upayavira
Mattheus,

Given these mails form a part of an archive that are themselves
self-contained, can you please post your actual question here? You're
more likely to get answers that way.

Thanks, Upayavira

On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
> Hello everyone,
> I'm having a problem regarding how to make a solr query, I've posted it
> on
> stackoverflow.
> Can someone help me?
> http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter
> 
> Thanks in advance!
> 
> -- 
> --
>  // Matheus Salvia
> Desenvolvedor Mobile
> Celular: +55 11 9-6446-2332
> Skype: meta.faraday


Re: Implementing Solr Suggester for Autocomplete (multiple columns)

2013-09-27 Thread JMill
I am not sure about the value to use for the option "popularity".  Is there
a method or do you just go with some arbitrary number?

On Thursday, September 26, 2013, Ing. Jorge Luis Betancourt Gonzalez <
jlbetanco...@uci.cu> wrote:
> Great!! I haven't see your message yet, perhaps you could create a PR to
that Github repository, son it will be in sync with current versions of
Solr.
>
> - Mensaje original -
> De: "JMill" 
> Para: solr-user@lucene.apache.org
> Enviados: Jueves, 26 de Septiembre 2013 9:10:49
> Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
columns)
>
> solved.
>
>
> On Thu, Sep 26, 2013 at 1:50 PM, JMill 
wrote:
>
>> I managed to get rid of the query error by playing jquery file in the
>> velocity folder and adding line: ">
src="#{url_for_solr}/admin/file?file=/velocity/jquery.min.js&contentType=text/javascript">".
>> That has not solved the issues the console is showing a new error -
>> "[13:42:55.181] TypeError: $.browser is undefined @
>>
http://localhost:8983/solr/ac/admin/file?file=/velocity/jquery.autocomplete.js&contentType=text/javascript:90
".
>> Any ideas?
>>
>>
>> On Thu, Sep 26, 2013 at 1:12 PM, JMill wrote:
>>
>>> Do you know the directory the "#{url_root}" in >> type="text/javascript" src="#{url_root}/js/lib/
>>> jquery-1.7.2.min.js"> points too? and same for
>>> ""#{url_for_solr}" >> src="#{url_for_solr}/js/lib/jquery-1.7.2.min.js">
>>>
>>>
>>> On Wed, Sep 25, 2013 at 7:33 PM, Ing. Jorge Luis Betancourt Gonzalez <
>>> jlbetanco...@uci.cu> wrote:
>>>
 Try quering the core where the data has been imported, something like:

 http://localhost:8983/solr/suggestions/select?q=uc

 In the previous URL suggestions is the name I give to the core, so this
 should change, if you get results, then the problem could be the jquery
 dependency. I don't remember doing any change, as far as I know that js
 file is bundled with solr (at leat in 3.x) version perhaps you could
change
 it the correct jquery version on solr 4.4, if you go into the admin
panel
 (in solr 3.6):

 http://localhost:8983/solr/admin/schema.jsp

 And inspect the loaded code, the required file (jquery-1.4.2.min.js)
 gets loaded in solr 4.4 it should load a similar file, but perhaps a
more
 recent version.

 Perhaps you could change that part to something like:

   >>> src="#{url_root}/js/lib/jquery-1.7.2.min.js">

 Which is used at least on a solr 4.1 that I have laying aroud here
 somewhere.

 In any case you can test the suggestions using the URL that I suggest
on
 the top of this mail, in that case you should be able to see the
possible
 results, of course in a less fancy way.

 - Mensaje original -
 De: "JMill" 
 Para: solr-user@lucene.apache.org
 Enviados: Miércoles, 25 de Septiembre 2013 13:59:32
 Asunto: Re: Implementing Solr Suggester for Autocomplete (multiple
 columns)

 Could it be the jquery library that is the problem?   I opened up
 solr-home/ac/conf/velocity/head.vm with an editor and I see a reference
 to
 the jquery library but I can't seem to find the directory referenced,
  line:  

Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Shawn !! That is it ! That fixed my problem, I changed name="tvrh" to
name="/tvrh" and used  http://localhost:8983/solr/mycol/tvrh instead and now
it is returning the term vectors !

Thanx man



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Hi,

- This is the part I added to the solrconfig.xml:


  
  

true


tvComponent

  

- This is the result:


  
0
0

  true
  *:*
  true
  tvrh
  xml


  
1

  iphone chair

1447362558901092352
  
  
2

  laptop macbook note

1447362568761901056
  
  
3

  iphone is an iphone !

1447362579746783232
  

  

- I don't see any logs about this query

Cheers




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092409.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr doesn't return TermVectors

2013-09-27 Thread Shawn Heisey

On 9/27/2013 1:35 PM, Jack Krupansky wrote:

You forgot the qt= parameter, such as on the wiki:

http://localhost:8983/solr/select/?&qt=tvrh&q=includes:[* TO *]&fl=id

And you need the custom request handler, such as on the wiki:



true


tvComponent



You can add that "last-components" list to your default handler, if you
wish.

I have more detailed examples in my e-book.


That wiki page probably needs to be updated to have a /tvrh handler 
instead of tvrh, and with /tvrh instead of /select.  The 'qt' route is 
the old way of doing things, before handleSelect="false" became the 
accepted best practice.


In order to help the sender, I've been trying to get this working on my 
dev server (running 4.4.0) and keep running into NullPointerException 
problems.  I think there's something important that I'm missing about 
how to use the component.  Here's an example of what my URL and request 
handler are using:


http://server:port/solr/core/tv?q=id:someId&tv.fl=catchall


  
true
  
  
tvComponent
  


java.lang.NullPointerException at 
org.apache.solr.handler.component.TermVectorComponent.process(TermVectorComponent.java:251)


Thanks,
Shawn



Re: Solr doesn't return TermVectors

2013-09-27 Thread Chris Hostetter

: 
http://localhost:8983/solr/mycol/select?q=id:1211&wt=json&indent=true&tv=true&qt=tvrh
: 
: I see all the fields associated with id:1211. I unloaded my collection using
: the "Core Admin" panel in solr, removed data and core.properties in my
: collection, added the core again and imported the data.
: 
: By "didn't work", I mean it returns everything I expect except the term
: vectors.

You've shown us:  
 - a request url
 - a field declaration

you have not shown us:
 - your solrconfig.xml showing the request handler configuration 
   (and requestDispatcher configuation)
 - the response you get from that request url
 - the log messaages you get when you hit that request url

these are all things that are pretty much mandatory for us to even 
begin to guess what might be going wrong for you...

https://wiki.apache.org/solr/UsingMailingLists

-Hoss


Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Hi Jack,

With this query:

http://localhost:8983/solr/mycol/select?q=id:1211&wt=json&indent=true&tv=true&qt=tvrh

I see all the fields associated with id:1211. I unloaded my collection using
the "Core Admin" panel in solr, removed data and core.properties in my
collection, added the core again and imported the data.

By "didn't work", I mean it returns everything I expect except the term
vectors.

Cheers



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky

Show us the response you got.

If you did have everything set up 100% properly and are still not seeing 
term vectors, then maybe you had indexed the data before setting up the full 
config. In which case, you would simply need to reindex the data. In that 
case the tem vector section would have indicated which fl fields did not 
have term vectors.


As a general proposition "it didn't work" is an extremely unhelpful 
response - it gives us no clues as to what you are actually seeing.


-- Jack Krupansky

-Original Message- 
From: alibozorgkhan

Sent: Friday, September 27, 2013 3:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr doesn't return TermVectors

Thanks for your reply, I actually added that before and it didn't work. I
tried it again and no luck.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092403.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
Thanks for your reply, I actually added that before and it didn't work. I
tried it again and no luck.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397p4092403.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr doesn't return TermVectors

2013-09-27 Thread Jack Krupansky

You forgot the qt= parameter, such as on the wiki:

http://localhost:8983/solr/select/?&qt=tvrh&q=includes:[* TO *]&fl=id

And you need the custom request handler, such as on the wiki:

class="org.apache.solr.handler.component.SearchHandler">

   
   true
   
   
   tvComponent
   


You can add that "last-components" list to your default handler, if you 
wish.


I have more detailed examples in my e-book.

-- Jack Krupansky

-Original Message- 
From: alibozorgkhan

Sent: Friday, September 27, 2013 3:04 PM
To: solr-user@lucene.apache.org
Subject: Solr doesn't return TermVectors

I followed http://wiki.apache.org/solr/TermVectorComponent step by step but
with the following request, I don't get any term vectors:


http://localhost:8983/solr/mycol/select?q=id:1211&wt=json&indent=true&tv=true

Just to be sure, I have this in my schema:

   

In my solrconfig, I have this:

   

Could anyone help me what the problem could be? BTW the solr version is
4.4.0. Thanx



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Solr doesn't return TermVectors

2013-09-27 Thread alibozorgkhan
I followed http://wiki.apache.org/solr/TermVectorComponent step by step but
with the following request, I don't get any term vectors:

   
http://localhost:8983/solr/mycol/select?q=id:1211&wt=json&indent=true&tv=true

Just to be sure, I have this in my schema:



In my solrconfig, I have this:



Could anyone help me what the problem could be? BTW the solr version is
4.4.0. Thanx



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-doesn-t-return-TermVectors-tp4092397.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr MailEntityProcessor not indexing "Content-Type: multipart/mixed;" emails

2013-09-27 Thread Andrey Padiy
Hi,

Trying to use DIH and MailEntityProcessor but are unable to index emails
that have "Content-Type: multipart/mixed;" or Content-Type:
multipart/related; header.

Solr logs show correct number of emails in the inbox when IMAP connection
is established but only emails that are of "Content-Type: text/plain;" or
"Content-Type: text/html;" are indexed. No exceptions thrown.

I am using out of the box example config that ships with solr-4-4.0 with
the following data-config.xml


  
  
  
  


Is this a know bug?

Thanks.


Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
solr-4.3.1





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr and jvm Garbage Collection tuning

2013-09-27 Thread ewinclub7
ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง
 
download goldclub   
เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก
ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย
เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน 
โปรโมชั่น goldclub slot    เล่นสนุก
เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน  ผลบอลเมื่อคืนนี้
  
จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู
สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่  goldclub slot
  
เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ
ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้
เล่นอีก แบบนี้ก็ได้เล่นกัน



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html
Sent from the Solr - User mailing list archive at Nabble.com.

Hello and help :)

2013-09-27 Thread Matheus Salvia
Hello everyone,
I'm having a problem regarding how to make a solr query, I've posted it on
stackoverflow.
Can someone help me?
http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter

Thanks in advance!

-- 
--
 // Matheus Salvia
Desenvolvedor Mobile
Celular: +55 11 9-6446-2332
Skype: meta.faraday


Re: Solr Commit Time

2013-09-27 Thread Walter Underwood
Right, it could be minutes or hours. Are the documents five word of plain text 
or 500 pages of PDF? Is there one simple field or are you running multiple 
field for different languages, plus entity extraction? And so on.

Also, some people on this list don't know the term "lakh", it is better to use 
"100,000".

wunder

On Sep 27, 2013, at 6:10 AM, Erick Erickson wrote:

> No way to say. How have you configured your autowarming
> parameters for instance? Why do you care? What problem are you
> trying to solve? Solr automatically handles warming up
> searchers and switching to the new one after a commit.
> 
> Best,
> Erick
> 
> On Fri, Sep 27, 2013 at 7:56 AM, Prasi S  wrote:
>> Hi,
>> What would be the maximum commit time for indexing 1 lakh documents in solr
>> on a 32 gb machine.
>> 
>> 
>> 
>> Thanks,
>> Prasi







Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Shawn Heisey
On 9/27/2013 8:37 AM, Shawn Heisey wrote:
> INFO  - 2013-09-27 08:27:00.806;
> org.apache.solr.update.processor.LogUpdateProcessor; [inclive]
> webapp=/solr path=/update params={wt=javabin&version=2}
> {add=[notimexpix438424 (144734108581888), notimexpix438425
> (1447341085825171456), notimexpix438426 (1447341085826220032),
> notimexpix438427 (1447341085826220033), notimexpix438428
> (1447341085827268608), notimexpix438429(1447341085828317184),
> notimexpix438430 (1447341085829365760), notimexpix438431
> (1447341085830414336), notimexpix438432 (1447341085831462912),
> notimexpix438433 (1447341085831462913), ... (66 adds)]} 0 181
> 
> INFO  - 2013-09-27 08:27:01.975;
> org.apache.solr.update.processor.LogUpdateProcessor; [inclive]
> webapp=/solr path=/update
> params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=false}
> {commit=} 0 1065
> 
> Note that the QTime doesn't represent the total amount of time for the
> request, because it only measures the part that's in under the control
> of the specific class that's generating the log - in this case
> LogUpdateProcessor.  It can't measure the time the servlet container
> takes to handle the HTTP conversation, or any part of the request that
> takes place in Solr classes called before or after LogUpdateProcessor.

I can illustrate the difference between QTime and the actual transaction
time by showing you the log entries from the application that correspond
exactly to the Solr log entries I shared:

INFO  - 2013-09-27 08:27:00.815; chain.c: Insert done, 66, time = 315
INFO  - 2013-09-27 08:27:01.976; chain.c: Commit done, time = 1161

The add request with 66 documents had a QTime of 181, but took 315
milliseconds.  The commit had a QTime of 1065, but actually took 1161
milliseconds.

Thanks,
Shawn



Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Shawn Heisey
On 9/27/2013 7:41 AM, Rafał Radecki wrote:
> On client side timeout is set to 5s but when I look in solr log I see QTime 
> less than 5000 (in ms). We use jetty to start solr process, where should I 
> look for directives connected with timeouts? 

Five seconds is WAY too short a timeout for the entire http
conversation.  Generally a timeout is not required, but if you feel you
need to set one, set it in terms of minutes, with one minute as an
absolute minimum.

Updates generally take longer than queries.  The amount of time taken
for the update itself is usually fairly small, but after a commit there
is usually cache warming, which depending on your configuration can take
quite a while.

I'm pretty sure that you won't see the QTime of update requests in the
log, at least not listed as "QTime" like it is on queries.  Here are two
entries from my log, one for the doc insert, the other for the commit.
I believe the last number is the QTime, but it doesn't *say* QTime.

INFO  - 2013-09-27 08:27:00.806;
org.apache.solr.update.processor.LogUpdateProcessor; [inclive]
webapp=/solr path=/update params={wt=javabin&version=2}
{add=[notimexpix438424 (144734108581888), notimexpix438425
(1447341085825171456), notimexpix438426 (1447341085826220032),
notimexpix438427 (1447341085826220033), notimexpix438428
(1447341085827268608), notimexpix438429(1447341085828317184),
notimexpix438430 (1447341085829365760), notimexpix438431
(1447341085830414336), notimexpix438432 (1447341085831462912),
notimexpix438433 (1447341085831462913), ... (66 adds)]} 0 181

INFO  - 2013-09-27 08:27:01.975;
org.apache.solr.update.processor.LogUpdateProcessor; [inclive]
webapp=/solr path=/update
params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=false}
{commit=} 0 1065

Note that the QTime doesn't represent the total amount of time for the
request, because it only measures the part that's in under the control
of the specific class that's generating the log - in this case
LogUpdateProcessor.  It can't measure the time the servlet container
takes to handle the HTTP conversation, or any part of the request that
takes place in Solr classes called before or after LogUpdateProcessor.

Thanks,
Shawn



Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Yes!

what I've done is set autoGeneratePhraseQueries to true for my field, then
give it a boost (bq=myAutompleteEdgeNGramField="my query with spaces"^50).
This only worked with autoGeneratePhraseQueries=true, for a reason I didn't
understand.

since when I did

q= myAutompleteEdgeNGramField="my query with spaces", I didn't need
autoGeneratePhraseQueries
set to true.

and, another thing is when I tried

q=myAutocompleteNGramField:(my query with spaces) OR
myAutompleteEdgeNGramField="my
query with spaces"

(with a request handler with edismax and default operator field = AND), the
request on myAutocompleteNGramField would OR the grams, so I had to put an
AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which
was pretty ugly.

I don't always understand what is exactly going on. If you have a pointer
to some text I could read to get more insights about this, please let me
know.

Thanks again,
Best regards,
Elisabeth




2013/9/27 Erick Erickson 

> Have you looked at "autoGeneratePhraseQueries"? That might help.
>
> If that doesn't work, you can always do something like add an OR clause
> like
> OR "original query"
> and optionally boost it high. But I'd start with the autoGenerate bits.
>
> Best,
> Erick
>
>
> On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
>  wrote:
> > Thanks for your answer.
> >
> > So I guess if someone wants to search on two fields, on with phrase query
> > and one with "normal" query (splitted in words), one has to find a way to
> > send query twice: one with quote and one without...
> >
> > Best regards,
> > Elisabeth
> >
> >
> > 2013/9/27 Erick Erickson 
> >
> >> This is a classic issue where there's confusion between
> >> the query parser and field analysis.
> >>
> >> Early in the process the query parser has to take the input
> >> and break it up. that's how, for instance, a query like
> >> text:term1 term2
> >> gets parsed as
> >> text:term1 defaultfield:term2
> >> This happens long before the terms get to the analysis chain
> >> for the field.
> >>
> >> So your only options are to either quote the string or
> >> escape the spaces.
> >>
> >> Best,
> >> Erick
> >>
> >> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
> >>  wrote:
> >> > Hello,
> >> >
> >> > I am using solr 4.2.1 and I have a autocomplete_edge type defined in
> >> > schema.xml
> >> >
> >> >
> >> > 
> >> >   
> >> >  >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> > 
> >> > 
> >> >  >> > replacement=" " replace="all"/>
> >> >  >> > minGramSize="1"/>
> >> >
> >> >   
> >> >  >> > mapping="mapping-ISOLatin1Accent.txt"/>
> >> > 
> >> > 
> >> >  >> > replacement=" " replace="all"/>
> >> >   >> > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >> >   
> >> > 
> >> >
> >> > When I have a request with more then one word, for instance "rue de
> la",
> >> my
> >> > request doesn't match with my autocomplete_edge field unless I use
> quotes
> >> > around the query. In other words q=rue de la doesnt work and q="rue de
> >> la"
> >> > works.
> >> >
> >> > I've check the request with debugQuery=on, and I can see in first
> case,
> >> the
> >> > query is splitted into words, and I don't understand why since my
> field
> >> > type uses KeywordTokenizerFactory.
> >> >
> >> > Does anyone have a clue on how I can request my field without using
> >> quotes?
> >> >
> >> > Thanks,
> >> > Elisabeth
> >>
>


DIH - delta query and delta import query executes transformer twice

2013-09-27 Thread Lee Carroll
Hi  It looks like when a DIH entity has a delta and delta import query plus
a transformer defined the execution of both query's call the transformer. I
was expecting it to only be called on the import query. Sure we can check
for a null value or something and just return the row during the delta
query execution, but is their a better way of doing this. That is not call
the transformer in the first place ?

Cheers Lee C


Re: Pubmed XML indexing

2013-09-27 Thread Francisco Fernandez
Many thanks both Mike and Alexandre.
I'll peek those tools.
Lux seems a good option.
Thanks again,

Francisco

El 27/09/2013, a las 09:33, Michael Sokolov escribió:

> You might be interested in Lux (http://luxdb.org), which is designed for 
> indexing and querying XML using Solr and Lucene.  It can run index-supported 
> XPath/XQuery over your documents, and you can define arbitrary XPath indexes.
> 
> -Mike
> 
> On 9/27/13 6:28 AM, Francisco Fernandez wrote:
>> Hi, I'm a newby trying to index PubMed texts obtained as xml with similar 
>> structure to:
>> 
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=23864173,22073418
>> 
>> The nodes I need to extract, expressed as XPaths would be:
>> 
>> //PubmedArticle/MedlineCitation/PMID
>> //PubmedArticle/MedlineCitation/DateCreated/Year
>> //PubmedArticle/MedlineCitation/Article/ArticleTitle
>> //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText
>> //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading
>> 
>> I think a way to index them in Solr is to create another xml structure 
>> similar to:
>> 
>> 
>>  PMID
>>  Year
>>  ArticleTitle
>>  AbstractText
>>  MeshHeading1
>>  MeshHeading2
>> 
>> 
>> 
>> Being "PMID" = '23864173' and "ArticleTitle" = 'Cost-effectiveness of 
>> low-molecular-weight heparin compared with aspirin for prophylaxis against 
>> venous thromboembolism after total joint arthroplasty' and so on.
>> With that structure I would post it to Solr using the following statement 
>> over the documents folder
>> java -jar post.jar *.xml
>> 
>> I'm wondering if is there a more direct way to perform the same task that 
>> does not imply a 'iterate->parsing->restructure->write to disk->post' cycle
>> Many thanks
>> 
>> Francisco
> 



Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Rafał Radecki
On client side timeout is set to 5s but when I look in solr log I see QTime 
less than 5000 (in ms). We use jetty to start solr process, where should I look 
for directives connected with timeouts? 




Re: Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Erick Erickson
No, this isn't normal. You probably have your servlet container or your
clients have a too-short timeout. How long are we talking about here anyway?

Best,
Erick

On Fri, Sep 27, 2013 at 8:57 AM, Rafał Radecki
 wrote:
> Hi All.
>
> I have a solr 3.5 multicore installation. It has ~250 of documents, 
> ~1,5GB of index data.
> When the solr is feed with new documents I see for a few seconds timeouts 
> 'Timeout was reached' on clients.
> Is it normal behaviour of solr during inserting of new documents?
>
> Best regards,
> Rafał Radecki.


Re: Solr Commit Time

2013-09-27 Thread Erick Erickson
No way to say. How have you configured your autowarming
parameters for instance? Why do you care? What problem are you
trying to solve? Solr automatically handles warming up
searchers and switching to the new one after a commit.

Best,
Erick

On Fri, Sep 27, 2013 at 7:56 AM, Prasi S  wrote:
> Hi,
> What would be the maximum commit time for indexing 1 lakh documents in solr
> on a 32 gb machine.
>
>
>
> Thanks,
> Prasi


Re: autocomplete_edge type split words

2013-09-27 Thread Erick Erickson
Have you looked at "autoGeneratePhraseQueries"? That might help.

If that doesn't work, you can always do something like add an OR clause like
OR "original query"
and optionally boost it high. But I'd start with the autoGenerate bits.

Best,
Erick


On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
 wrote:
> Thanks for your answer.
>
> So I guess if someone wants to search on two fields, on with phrase query
> and one with "normal" query (splitted in words), one has to find a way to
> send query twice: one with quote and one without...
>
> Best regards,
> Elisabeth
>
>
> 2013/9/27 Erick Erickson 
>
>> This is a classic issue where there's confusion between
>> the query parser and field analysis.
>>
>> Early in the process the query parser has to take the input
>> and break it up. that's how, for instance, a query like
>> text:term1 term2
>> gets parsed as
>> text:term1 defaultfield:term2
>> This happens long before the terms get to the analysis chain
>> for the field.
>>
>> So your only options are to either quote the string or
>> escape the spaces.
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
>>  wrote:
>> > Hello,
>> >
>> > I am using solr 4.2.1 and I have a autocomplete_edge type defined in
>> > schema.xml
>> >
>> >
>> > 
>> >   
>> > > > mapping="mapping-ISOLatin1Accent.txt"/>
>> > 
>> > 
>> > > > replacement=" " replace="all"/>
>> > > > minGramSize="1"/>
>> >
>> >   
>> > > > mapping="mapping-ISOLatin1Accent.txt"/>
>> > 
>> > 
>> > > > replacement=" " replace="all"/>
>> >  > > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
>> >   
>> > 
>> >
>> > When I have a request with more then one word, for instance "rue de la",
>> my
>> > request doesn't match with my autocomplete_edge field unless I use quotes
>> > around the query. In other words q=rue de la doesnt work and q="rue de
>> la"
>> > works.
>> >
>> > I've check the request with debugQuery=on, and I can see in first case,
>> the
>> > query is splitted into words, and I don't understand why since my field
>> > type uses KeywordTokenizerFactory.
>> >
>> > Does anyone have a clue on how I can request my field without using
>> quotes?
>> >
>> > Thanks,
>> > Elisabeth
>>


Re: SolrCloud setup - any advice?

2013-09-27 Thread Erick Erickson
I think you're right, but you can specify a default value in your schema.xml
to at least see if this is a good path to follow.

Best,
Erick

On Fri, Sep 27, 2013 at 3:46 AM, Neil Prosser  wrote:
> Good point. I'd seen docValues and wondered whether they might be of use in
> this situation. However, as I understand it they require a value to be set
> for all documents until Solr 4.5. Is that true or was I imagining reading
> that?
>
>
> On 25 September 2013 11:36, Erick Erickson  wrote:
>
>> H, I confess I haven't had a chance to play with this yet,
>> but have you considered docValues for some of your fields? See:
>> http://wiki.apache.org/solr/DocValues
>>
>> And just to tantalize you:
>>
>> > Since Solr4.2 to build a forward index for a field, for purposes of
>> sorting, faceting, grouping, function queries, etc.
>>
>> > You can specify a different docValuesFormat on the fieldType
>> (docValuesFormat="Disk") to only load minimal data on the heap, keeping
>> other data structures on disk.
>>
>> Do note, though:
>> > Not a huge improvement for a static index
>>
>> this latter isn't a problem though since you don't have a static index
>>
>> Erick
>>
>> On Tue, Sep 24, 2013 at 4:13 AM, Neil Prosser 
>> wrote:
>> > Shawn: unfortunately the current problems are with facet.method=enum!
>> >
>> > Erick: We already round our date queries so they're the same for at least
>> > an hour so thankfully our fq entries will be reusable. However, I'll
>> take a
>> > look at reducing the cache and autowarming counts and see what the effect
>> > on hit ratios and performance are.
>> >
>> > For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds
>> > and our hard commit is 15 minutes.
>> >
>> > You're right about those sorted fields having a lot of unique values.
>> They
>> > can be any number between 0 and 10,000,000 (it's sparsely populated
>> across
>> > the documents) and could appear in several variants across multiple
>> > documents. This is probably a good area for seeing what we can bend with
>> > regard to our requirements for sorting/boosting. I've just looked at two
>> > shards and they've each got upwards of 1000 terms showing in the schema
>> > browser for one (potentially out of 60) fields.
>> >
>> >
>> >
>> > On 21 September 2013 20:07, Erick Erickson 
>> wrote:
>> >
>> >> About caches. The queryResultCache is only useful when you expect there
>> >> to be a number of _identical_ queries. Think of this cache as a map
>> where
>> >> the key is the query and the value is just a list of N document IDs
>> >> (internal)
>> >> where N is your window size. Paging is often the place where this is
>> used.
>> >> Take a look at your admin page for this cache, you can see the hit
>> rates.
>> >> But, the take-away is that this is a very small cache memory-wise,
>> varying
>> >> it is probably not a great predictor of memory usage.
>> >>
>> >> The filterCache is more intense memory wise, it's another map where the
>> >> key is the fq clause and the value is bounded by maxDoc/8. Take a
>> >> close look at this in the admin screen and see what the hit ratio is. It
>> >> may
>> >> be that you can make it much smaller and still get a lot of benefit.
>> >> _Especially_ considering it could occupy about 44G of memory.
>> >> (43,000,000 / 8) * 8192 And the autowarm count is excessive in
>> >> most cases from what I've seen. Cutting the autowarm down to, say, 16
>> >> may not make a noticeable difference in your response time. And if
>> >> you're using NOW in your fq clauses, it's almost totally useless, see:
>> >> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
>> >>
>> >> Also, read Uwe's excellent blog about MMapDirectory here:
>> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> >> for some problems with over-allocating memory to the JVM. Of course
>> >> if you're hitting OOMs, well.
>> >>
>> >> bq: order them by one of their fields.
>> >> This is one place I'd look first. How many unique values are in each
>> field
>> >> that you sort on? This is one of the major memory consumers. You can
>> >> get a sense of this by looking at admin/schema-browser and selecting
>> >> the fields you sort on. There's a text box with the number of terms
>> >> returned,
>> >> then a / ### where ### is the total count of unique terms in the field.
>> >> NOTE:
>> >> in 4.4 this will be -1 for multiValued fields, but you shouldn't be
>> >> sorting on
>> >> those anyway. How many fields are you sorting on anyway, and of what
>> types?
>> >>
>> >> For your SolrCloud experiments, what are your soft and hard commit
>> >> intervals?
>> >> Because something is really screwy here. Your sharding moving the
>> >> number of docs down this low per shard should be fast. Back to the point
>> >> above, the only good explanation I can come up with from this remove is
>> >> that the fields you sort on have a LOT of unique values. It's possible
>> that
>> >> the total number of uniq

Re: ContributorsGroup

2013-09-27 Thread Erick Erickson
Stefan is more thorough than me, I'd have added the wrong name :)

Thanks for volunteering!

Erick

On Thu, Sep 26, 2013 at 9:17 PM, JavaOne  wrote:
> Yes - that is me.
>
> mikelabib is my Jira user. Thanks for asking.
>
> Sent from my iPhone
>
> On Sep 26, 2013, at 7:32 PM, Erick Erickson  wrote:
>
>> Hmmm, did Stefan add you correctly? I see MichaelLabib as a
>> contributor, but not mikelabib...
>>
>> Best
>> Erick
>>
>> On Thu, Sep 26, 2013 at 1:20 PM, Mike L.  wrote:
>>>
>>> ah sorry! its: mikelabib
>>>
>>> thanks!
>>>
>>> From: Stefan Matheis 
>>> To: solr-user@lucene.apache.org
>>> Sent: Thursday, September 26, 2013 12:05 PM
>>> Subject: Re: ContributorsGroup
>>>
>>>
>>> Mike
>>>
>>> To add you as Contributor i'd need to know your Username? :)
>>>
>>> Stefan
>>>
>>>
>>> On Thursday, September 26, 2013 at 6:50 PM, Mike L. wrote:
>>>

 Solr Admins,

 I've been using Solr for the last couple years and would like to 
 contribute to this awesome project. Can I be added to the 
 Contributorsgroup with also access to update the Wiki?

 Thanks in advance.

 Mike L.




Solr client 'Timeout was reached' ~ when new documents are inserted and commits are made.

2013-09-27 Thread Rafał Radecki
Hi All. 

I have a solr 3.5 multicore installation. It has ~250 of documents, ~1,5GB 
of index data. 
When the solr is feed with new documents I see for a few seconds timeouts 
'Timeout was reached' on clients. 
Is it normal behaviour of solr during inserting of new documents? 

Best regards, 
Rafał Radecki. 


Re: Pubmed XML indexing

2013-09-27 Thread Michael Sokolov
You might be interested in Lux (http://luxdb.org), which is designed for 
indexing and querying XML using Solr and Lucene.  It can run 
index-supported XPath/XQuery over your documents, and you can define 
arbitrary XPath indexes.


-Mike

On 9/27/13 6:28 AM, Francisco Fernandez wrote:

Hi, I'm a newby trying to index PubMed texts obtained as xml with similar 
structure to:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=23864173,22073418

The nodes I need to extract, expressed as XPaths would be:

//PubmedArticle/MedlineCitation/PMID
//PubmedArticle/MedlineCitation/DateCreated/Year
//PubmedArticle/MedlineCitation/Article/ArticleTitle
//PubmedArticle/MedlineCitation/Article/Abstract/AbstractText
//PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading

I think a way to index them in Solr is to create another xml structure similar 
to:


  PMID
  Year
  ArticleTitle
  AbstractText
  MeshHeading1
  MeshHeading2



Being "PMID" = '23864173' and "ArticleTitle" = 'Cost-effectiveness of 
low-molecular-weight heparin compared with aspirin for prophylaxis against venous thromboembolism 
after total joint arthroplasty' and so on.
With that structure I would post it to Solr using the following statement over 
the documents folder
java -jar post.jar *.xml

I'm wondering if is there a more direct way to perform the same task that does not imply a 
'iterate->parsing->restructure->write to disk->post' cycle
Many thanks

Francisco




Re: Sum function causing error in solr

2013-09-27 Thread Yonik Seeley
On Fri, Sep 27, 2013 at 2:28 AM, Tanu Garg  wrote:
> tried this as well. but its not working.

It's working fine for me.  What version of Solr are you using?
What does your complete request look like?

-Yonik
http://lucidworks.com


Solr Commit Time

2013-09-27 Thread Prasi S
Hi,
What would be the maximum commit time for indexing 1 lakh documents in solr
on a 32 gb machine.



Thanks,
Prasi


Re: Pubmed XML indexing

2013-09-27 Thread Alexandre Rafalovitch
Did you look at dataImportHandler? There is also Flume, I think.

Regards,
 Alex
On 27 Sep 2013 17:28, "Francisco Fernandez"  wrote:

> Hi, I'm a newby trying to index PubMed texts obtained as xml with similar
> structure to:
>
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=23864173,22073418
>
> The nodes I need to extract, expressed as XPaths would be:
>
> //PubmedArticle/MedlineCitation/PMID
> //PubmedArticle/MedlineCitation/DateCreated/Year
> //PubmedArticle/MedlineCitation/Article/ArticleTitle
> //PubmedArticle/MedlineCitation/Article/Abstract/AbstractText
> //PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading
>
> I think a way to index them in Solr is to create another xml structure
> similar to:
> 
> 
>  PMID
>  Year
>  ArticleTitle
>  AbstractText
>  MeshHeading1
>  MeshHeading2
> 
> 
>
> Being "PMID" = '23864173' and "ArticleTitle" = 'Cost-effectiveness of
> low-molecular-weight heparin compared with aspirin for prophylaxis against
> venous thromboembolism after total joint arthroplasty' and so on.
> With that structure I would post it to Solr using the following statement
> over the documents folder
> java -jar post.jar *.xml
>
> I'm wondering if is there a more direct way to perform the same task that
> does not imply a 'iterate->parsing->restructure->write to disk->post' cycle
> Many thanks
>
> Francisco


Re: autocomplete_edge type split words

2013-09-27 Thread elisabeth benoit
Thanks for your answer.

So I guess if someone wants to search on two fields, on with phrase query
and one with "normal" query (splitted in words), one has to find a way to
send query twice: one with quote and one without...

Best regards,
Elisabeth


2013/9/27 Erick Erickson 

> This is a classic issue where there's confusion between
> the query parser and field analysis.
>
> Early in the process the query parser has to take the input
> and break it up. that's how, for instance, a query like
> text:term1 term2
> gets parsed as
> text:term1 defaultfield:term2
> This happens long before the terms get to the analysis chain
> for the field.
>
> So your only options are to either quote the string or
> escape the spaces.
>
> Best,
> Erick
>
> On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
>  wrote:
> > Hello,
> >
> > I am using solr 4.2.1 and I have a autocomplete_edge type defined in
> > schema.xml
> >
> >
> > 
> >   
> >  > mapping="mapping-ISOLatin1Accent.txt"/>
> > 
> > 
> >  > replacement=" " replace="all"/>
> >  > minGramSize="1"/>
> >
> >   
> >  > mapping="mapping-ISOLatin1Accent.txt"/>
> > 
> > 
> >  > replacement=" " replace="all"/>
> >   > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >   
> > 
> >
> > When I have a request with more then one word, for instance "rue de la",
> my
> > request doesn't match with my autocomplete_edge field unless I use quotes
> > around the query. In other words q=rue de la doesnt work and q="rue de
> la"
> > works.
> >
> > I've check the request with debugQuery=on, and I can see in first case,
> the
> > query is splitted into words, and I don't understand why since my field
> > type uses KeywordTokenizerFactory.
> >
> > Does anyone have a clue on how I can request my field without using
> quotes?
> >
> > Thanks,
> > Elisabeth
>


Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
tried this as well. but its not working.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sum function causing error in solr

2013-09-27 Thread Tanu Garg
Yes jack. have tried this. but giving the same error.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sum-function-causing-error-in-solr-tp4091901p4092307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Pubmed XML indexing

2013-09-27 Thread Francisco Fernandez
Hi, I'm a newby trying to index PubMed texts obtained as xml with similar 
structure to:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=23864173,22073418

The nodes I need to extract, expressed as XPaths would be:

//PubmedArticle/MedlineCitation/PMID
//PubmedArticle/MedlineCitation/DateCreated/Year
//PubmedArticle/MedlineCitation/Article/ArticleTitle
//PubmedArticle/MedlineCitation/Article/Abstract/AbstractText
//PubmedArticle/MedlineCitation/MeshHeadingList/MeshHeading

I think a way to index them in Solr is to create another xml structure similar 
to:


 PMID
 Year
 ArticleTitle
 AbstractText
 MeshHeading1
 MeshHeading2



Being "PMID" = '23864173' and "ArticleTitle" = 'Cost-effectiveness of 
low-molecular-weight heparin compared with aspirin for prophylaxis against 
venous thromboembolism after total joint arthroplasty' and so on.
With that structure I would post it to Solr using the following statement over 
the documents folder
java -jar post.jar *.xml

I'm wondering if is there a more direct way to perform the same task that does 
not imply a 'iterate->parsing->restructure->write to disk->post' cycle
Many thanks

Francisco

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-27 Thread Andreas Owen
i removed the FieldReaderDataSource and dataSource="fld" but it didn't help. i 
get the following for each document:
DataImportHandlerException: Exception in invoking url null Processing 
Document # 9
nullpointerexception


On 26. Sep 2013, at 8:39 PM, P Williams wrote:

> Hi,
> 
> Haven't tried this myself but maybe try leaving out the
> FieldReaderDataSource entirely.  From my quick searching looks like it's
> tied to SQL.  Did you try copying the
> http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example
> exactly?  What happens when you leave out FieldReaderDataSource?
> 
> Cheers,
> Tricia
> 
> 
> On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen  wrote:
> 
>> i'm using solr 4.3.1 and the dataimporter. i am trying to use
>> XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages
>> but i'm getting this error for each document. i have also tried
>> dataField="tika.text" and dataField="text" to no avail. the nested
>> XPathEntityProcessor "detail" creates the error, the rest works fine. what
>> am i doing wrong?
>> 
>> error:
>> 
>> ERROR - 2013-09-26 12:08:49.006;
>> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query failed
>> 'null'
>> java.lang.ClassCastException: java.io.StringReader cannot be cast to
>> java.util.Iterator
>>at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>>at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
>>at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
>>at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
>>at
>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
>>at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
>>at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>>at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>>at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>>at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>>at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>>at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>>at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>>at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>>at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>>at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>>at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>>at org.eclipse.jetty.server.Server.handle(Server.java:365)
>>at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>>at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>>at
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>>at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>>at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
>>at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>>at
>> org.eclipse.jetty.server.BlockingHttpConnection

Re: Doing time sensitive search in solr

2013-09-27 Thread Alexandre Rafalovitch
If your different strings have different semantics (date, etc), you may
need to split your entries based on that semantics.

Either have the 'entity' represent one 'string-date' structure or have
additional field that represents content searchable during that specific
period and only have one with all the strings as stored (if you absolutely
need it).

Search for Gilt's presentation on Solr, they deal with some of the similar
issues (flash sales).

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Sep 27, 2013 at 6:52 AM, Darniz  wrote:

> hello Users,
>
> i have a requirement where my content should be search based upon time. For
> example below is our content in our cms.
> 
> Sept content : Honda is releasing the car this month
> 
>
> 
> Dec content : Toyota is releasing the car this month
> 
>
> On the website based upon time we display the content. On the solr side,
> until now we were indexing all entries element in Solr in text field. Now
> after we introduced time sensitive information in our cms, i need to know
> if
> someone queries for word "Toyota" it should NOT come up in my search
> results
> since that content is going live in dec.
>
> The solr text field looks something like
> 
> Honda is releasing the car this month
> Toyota is releasing this month
> 
>
> is there a way we can search the text field or append any meta data to the
> text field based on date.
>
> i hope i have made the issue clear. i kind of don't agree with this kind of
> practice but our requirement is pretty peculiar since we don't want to
> reindex data again and again.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Can i trust the order of how documents are received in solrcloud?

2013-09-27 Thread xaon
Hi, i am a new user of solrcloud, and i am wandering if this scenario could
happen:

in a Shard, i have three machine: leader, replica1, replica2

replica1 received a document D, and right after that, replica2 received an
updated version of D, let's called it D'

they all tried to forward their documents to the leader, who will generate
version numbers for the documents, and then distributes them to replicas.

it's possible that the leader could receive D' prior to D? 
so that D' gets overrided?

thanks a lot!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-i-trust-the-order-of-how-documents-are-received-in-solrcloud-tp4092322.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ALIAS feature, can be used for what?

2013-09-27 Thread Yago Riveiro
I need delete the alias for the old collection before point it to the new, 
right?

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, September 27, 2013 at 2:25 AM, Otis Gospodnetic wrote:

> Hi,
> 
> Imagine you have an index and you need to reindex your data into a new
> index, but don't want to have to reconfigure or restart client apps
> when you want to point them to the new index. This is where aliases
> come in handy. If you created an alias for the first index and made
> your apps hit that alias, then you can just repoint the same alias to
> your new index and avoid having to touch client apps.
> 
> No, I don't think you can write to multiple collections through a single 
> alias.
> 
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
> 
> 
> 
> On Thu, Sep 26, 2013 at 6:34 AM, yriveiro  (mailto:yago.rive...@gmail.com)> wrote:
> > Today I was thinking about the ALIAS feature and the utility on Solr.
> > 
> > Can anyone explain me with an example where this feature may be useful?
> > 
> > It's possible have an ALIAS of multiples collections, if I do a write to the
> > alias, Is this write replied to all collections?
> > 
> > /Yago
> > 
> > 
> > 
> > -
> > Best regards
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/ALIAS-feature-can-be-used-for-what-tp4092095.html
> > Sent from the Solr - User mailing list archive at Nabble.com 
> > (http://Nabble.com).
> > 
> 
> 
> 




Re: cold searcher

2013-09-27 Thread Dmitry Kan
Erick,

I actually agree and we are looking into bundling commits into a batch type
update with soft-commits serving the batches and hard commit kicking in
larger periods of time.

In practice, we have already noticed the periodic slow downs in search for
exactly same queries before and after commit points. To describe it
briefly: the queries that used to take lots of time to execute on solr 3.4
now execute super-fast whereas during the periodic slow downs they execute
as slow as on solr 3.4. I bet there is a dependency, as you said, between
the several searchers warmup and caches flushing.

Thanks,

Dmitry


On Fri, Sep 27, 2013 at 3:44 AM, Erick Erickson wrote:

> Upping the number of concurrent warming searchers is almost always the
> wrong thing to do. I'd lengthen the polling interval or the commit
> interval.
> Throwing away warming searchers is uselessly consuming resources. And
> if you're trying to do any filter queries, your caches will almost never be
> used since you're throwing them away so often.
>
> Best,
> Erick
>
> On Thu, Sep 26, 2013 at 3:52 PM, Shawn Heisey  wrote:
> > On 9/26/2013 10:56 AM, Dmitry Kan wrote:
> >>
> >> Btw, related to master-slave setup. What makes read-only slave not to
> come
> >> across the same issue? Would it not pull data from the master and warm
> up
> >> searchers? Or does it do updates in a more controlled fashion that makes
> >> it
> >> avoid these issues?
> >
> >
> > Most people have the slave pollInterval configured on an interval that's
> > pretty long, like 15 seconds to several minutes -- much longer than a
> > typical searcher warming time.
> >
> > For a slave, new searchers are only created when there is a change copied
> > over from the master.  There may be several master-side commits that
> happen
> > during the pollInterval, but the slave won't see all of those.
> >
> > Thanks,
> > Shawn
> >
>


Re: cold searcher

2013-09-27 Thread Dmitry Kan
Thanks Shawn, the master-slave setup is something that requires separate
study as our update rate is more of bulk type than small incremental bits
(at least at this point). But thanks, this background information always
useful.


On Thu, Sep 26, 2013 at 10:52 PM, Shawn Heisey  wrote:

> On 9/26/2013 10:56 AM, Dmitry Kan wrote:
>
>> Btw, related to master-slave setup. What makes read-only slave not to come
>> across the same issue? Would it not pull data from the master and warm up
>> searchers? Or does it do updates in a more controlled fashion that makes
>> it
>> avoid these issues?
>>
>
> Most people have the slave pollInterval configured on an interval that's
> pretty long, like 15 seconds to several minutes -- much longer than a
> typical searcher warming time.
>
> For a slave, new searchers are only created when there is a change copied
> over from the master.  There may be several master-side commits that happen
> during the pollInterval, but the slave won't see all of those.
>
> Thanks,
> Shawn
>
>


Re: SolrCloud setup - any advice?

2013-09-27 Thread Neil Prosser
Good point. I'd seen docValues and wondered whether they might be of use in
this situation. However, as I understand it they require a value to be set
for all documents until Solr 4.5. Is that true or was I imagining reading
that?


On 25 September 2013 11:36, Erick Erickson  wrote:

> H, I confess I haven't had a chance to play with this yet,
> but have you considered docValues for some of your fields? See:
> http://wiki.apache.org/solr/DocValues
>
> And just to tantalize you:
>
> > Since Solr4.2 to build a forward index for a field, for purposes of
> sorting, faceting, grouping, function queries, etc.
>
> > You can specify a different docValuesFormat on the fieldType
> (docValuesFormat="Disk") to only load minimal data on the heap, keeping
> other data structures on disk.
>
> Do note, though:
> > Not a huge improvement for a static index
>
> this latter isn't a problem though since you don't have a static index
>
> Erick
>
> On Tue, Sep 24, 2013 at 4:13 AM, Neil Prosser 
> wrote:
> > Shawn: unfortunately the current problems are with facet.method=enum!
> >
> > Erick: We already round our date queries so they're the same for at least
> > an hour so thankfully our fq entries will be reusable. However, I'll
> take a
> > look at reducing the cache and autowarming counts and see what the effect
> > on hit ratios and performance are.
> >
> > For SolrCloud our soft commit (openSearcher=false) interval is 15 seconds
> > and our hard commit is 15 minutes.
> >
> > You're right about those sorted fields having a lot of unique values.
> They
> > can be any number between 0 and 10,000,000 (it's sparsely populated
> across
> > the documents) and could appear in several variants across multiple
> > documents. This is probably a good area for seeing what we can bend with
> > regard to our requirements for sorting/boosting. I've just looked at two
> > shards and they've each got upwards of 1000 terms showing in the schema
> > browser for one (potentially out of 60) fields.
> >
> >
> >
> > On 21 September 2013 20:07, Erick Erickson 
> wrote:
> >
> >> About caches. The queryResultCache is only useful when you expect there
> >> to be a number of _identical_ queries. Think of this cache as a map
> where
> >> the key is the query and the value is just a list of N document IDs
> >> (internal)
> >> where N is your window size. Paging is often the place where this is
> used.
> >> Take a look at your admin page for this cache, you can see the hit
> rates.
> >> But, the take-away is that this is a very small cache memory-wise,
> varying
> >> it is probably not a great predictor of memory usage.
> >>
> >> The filterCache is more intense memory wise, it's another map where the
> >> key is the fq clause and the value is bounded by maxDoc/8. Take a
> >> close look at this in the admin screen and see what the hit ratio is. It
> >> may
> >> be that you can make it much smaller and still get a lot of benefit.
> >> _Especially_ considering it could occupy about 44G of memory.
> >> (43,000,000 / 8) * 8192 And the autowarm count is excessive in
> >> most cases from what I've seen. Cutting the autowarm down to, say, 16
> >> may not make a noticeable difference in your response time. And if
> >> you're using NOW in your fq clauses, it's almost totally useless, see:
> >> http://searchhub.org/2012/02/23/date-math-now-and-filter-queries/
> >>
> >> Also, read Uwe's excellent blog about MMapDirectory here:
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >> for some problems with over-allocating memory to the JVM. Of course
> >> if you're hitting OOMs, well.
> >>
> >> bq: order them by one of their fields.
> >> This is one place I'd look first. How many unique values are in each
> field
> >> that you sort on? This is one of the major memory consumers. You can
> >> get a sense of this by looking at admin/schema-browser and selecting
> >> the fields you sort on. There's a text box with the number of terms
> >> returned,
> >> then a / ### where ### is the total count of unique terms in the field.
> >> NOTE:
> >> in 4.4 this will be -1 for multiValued fields, but you shouldn't be
> >> sorting on
> >> those anyway. How many fields are you sorting on anyway, and of what
> types?
> >>
> >> For your SolrCloud experiments, what are your soft and hard commit
> >> intervals?
> >> Because something is really screwy here. Your sharding moving the
> >> number of docs down this low per shard should be fast. Back to the point
> >> above, the only good explanation I can come up with from this remove is
> >> that the fields you sort on have a LOT of unique values. It's possible
> that
> >> the total number of unique values isn't scaling with sharding. That is,
> >> each
> >> shard may have, say, 90% of all unique terms (number from thin air).
> Worth
> >> checking anyway, but a stretch.
> >>
> >> This is definitely unusual...
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Thu, Sep 19, 2013 at 8:20 AM, Neil Prosser 
>