date:20140130

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexandre Rafalovitch

Hi Nilesh,

I am not sure the faceting code does what you think it does. However,
there are different options and you can experiment with whichever one
is best for you. They are controlled by the facet.method parameter:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 12:51 AM, Felipe Dantas de Souza Paiva
 wrote:
> Hi Nilesh,
>
> maybe Facetting is not the right thing for you, because 'faceting is the 
> arrangement of search results into categories based on indexed terms' 
> (https://cwiki.apache.org/confluence/display/solr/Faceting).
>
> Perhaps you could use Result Clustering 
> (https://cwiki.apache.org/confluence/display/solr/Result+Clustering), for   
> the clustering algorithm is applied to the search result of each single query.
>
> Hope this helps.
>
> Felipe Dantas de Souza Paiva
> 
> De: Kuchekar [kuchekar.nil...@gmail.com]
> Enviado: quinta-feira, 30 de janeiro de 2014 15:35
> Para: solr-user@lucene.apache.org
> Assunto: Re: Regarding Solr Faceting on the query response.
>
> Hi Mikhail,
>
>  I would like my faceting to run only on my resultset
> returned as in only on numFound, rather than the whole index.
>
> In the example, even when I specify the query 'company:Apple' .. it gives
> me faceted results for other companies. This means that it is querying
> against the whole index, rather than just the result set.
>
> Using facet.mincount=1 will give me faceted values which are greater than
> 1, but that will again to retrieve all the distinct values (Apple, Bose,
> Chevron, ..Oracle..) of facet field (company) query the whole index.
>
> What I would like to do is ... facet only on the resultset.
>
> i.e. my query (q= company:Apple AND technologies:java ) should return, only
> the facet details about 'Apple' since that is only present in the results
> set. But it provides me the list of other Company Names ... which makes me
> believe that it is querying the whole index to get the distinct value for
> the company..
>
> "docs": [ { "id": "ABC123", "company": [ "APPLE" ] },
> { "id": "ABC1234", "company": [ "APPLE" ] },
> { "id": "ABC1235", "company": [ "APPLE" ] },
> { "id": "ABC1236", "company": [ "APPLE" ] } ] }, "facet_counts": { "
> facet_queries": { "p_company:ucsf\n": 1 }, "facet_fields": { "company": [
> "APPLE", 4, ] }, "facet_dates": {}, "facet_ranges": {} }
>
>
>  Thanks.
> Kuchekar, Nilesh
>
>
> On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com> wrote:
>
>> Hello
>> Do you mean setting
>> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
>> you want to facet only returned page (rows) instead of full resultset
>> (numFound) ?
>>
>>
>> On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
>> wrote:
>>
>> > Yeah it's a typo... I meant company:Apple
>> >
>> > Thanks
>> > Nilesh
>> >
>> > > On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch > >
>> > wrote:
>> > >
>> > >> On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar 
>> > wrote:
>> > >> company=Apple
>> > > Did you mean company:Apple ?
>> > >
>> > > Otherwise, that could be the issue.
>> > >
>> > > Regards,
>> > >   Alex.
>> > >
>> > >
>> > > Personal website: http://www.outerthoughts.com/
>> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> > > - Time is the quality of nature that keeps events from happening all
>> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> > > book)
>> >
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>>
>> 
>>  
>>
>
> 
>
> AVISO: A informação contida neste e-mail, bem como em qualquer de seus 
> anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) 
> acima referido(s), podendo conter informações sigilosas e/ou legalmente 
> protegidas. Caso você não seja o destinatário desta mensagem, informamos que 
> qualquer divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de 
> seus anexos é absolutamente proibida. Solicitamos que o remetente seja 
> comunicado imediatamente, respondendo esta mensagem, e que o original desta 
> mensagem e de seus anexos, bem como toda e qualquer cópia e/ou impressão 
> realizada a partir destes, sejam permanentemente apagados e/ou destruídos. 
> Informações adicionais sobre nossa empresa podem ser obtidas no site 
> http://sobre.uol.com.br/.
>
> NOTICE: The information contained in this e-mail and any attachments thereto 
> is CONFIDENTIAL and is intended only for use by the recipient named herein 
> and may contain legally privileged and/or secret information.
> If you are not the e-mail´s intended recip

Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch

Hmm,

Try the variable reference without scope: ${id}. I can't remember if
the scope is required only for higher level items. It might also be
worth writing a very basic All fields logger to see what your
in-progress map looks like.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 7:10 AM, tom  wrote:
> Thanks Alexandre for quick response,
>
> I tried both the ways but still no luck null values, anything I am doing
> fundamentally wrong?
>
> query="select DOC_IDN, BILL_IDN from document_fact" >
>
>
> and
>
> query="select DOC_IDN as id ,BILL_IDN as bill_id from document_fact" >
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539p4114544.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: JVM heap constraints and garbage collection

2014-01-30 Thread Shawn Heisey


On 1/30/2014 3:20 PM, Joseph Hagerty wrote:

I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.





- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM


One detail that you did not provide was how much of your 7.5GB RAM you 
are allocating to the Java heap for Solr, but I actually don't think I 
need that information, because for your index size, you simply don't 
have enough. If you're sticking with Amazon, you'll want one of the 
instances with at least 30GB of RAM, and you might want to consider more 
memory than that.


An ideal RAM size for Solr is equal to the size of on-disk data plus the 
heap space used by Solr and other programs.  This means that if your 
java heap for Solr is 4GB and there are no other significant programs 
running on the same server, you'd want a minimum of 34GB of RAM for an 
ideal setup with your index.  4GB of that would be for Solr itself, the 
remainder would be for the operating system to fully cache your index in 
the OS disk cache.


Depending on your query patterns and how your schema is arranged, you 
*might* be able to get away as little as half of your index size just 
for the OS disk cache, but it's better to make it big enough for the 
whole index, plus room for growth.


http://wiki.apache.org/solr/SolrPerformanceProblems

Many people are *shocked* when they are told this information, but if 
you think about the relative speeds of getting a chunk of data from a 
hard disk vs. getting the same information from memory, it's not all 
that shocking.


Thanks,
Shawn

Re: Boosting documents by categorical preferences

2014-01-30 Thread Amit Nithian

Chris,

Sounds good! Thanks for the tips.. I'll be glad to submit my talk to this
as I have a writeup pretty much ready to go.

Cheers
Amit


On Tue, Jan 28, 2014 at 11:24 AM, Chris Hostetter
wrote:

>
> : The initial results seem to be kinda promising... of course there are
> many
> : more optimizations I could do like decay user ratings over time to
> indicate
> : that preferences decay over time so a 5 rating a year ago doesn't count
> as
> : much as a 5 rating today.
> :
> : Hope this helps others. I'll open source what I have soon and post back.
> If
> : there is feedback or other thoughts let me know!
>
> Hey Amit,
>
> Glad to hear your user based boosting experiments are paying off.  I would
> definitely love to see a more detailed writeup down the road showing off
> how it affects your final user metrics -- or perhaps even give a session
> on your technique at ApacheCon?
>
>
> http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: TemplateTransformer returns null values

2014-01-30 Thread tom

Thanks Alexandre for quick response,

I tried both the ways but still no luck null values, anything I am doing
fundamentally wrong?
 
query="select DOC_IDN, BILL_IDN from document_fact" >


and

query="select DOC_IDN as id ,BILL_IDN as bill_id from document_fact" >
   




--
View this message in context: 
http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539p4114544.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: TemplateTransformer returns null values

2014-01-30 Thread Alexandre Rafalovitch

I think you have double mapping there:
*) select DOC_IDN as id
*) 
Both are mapping DOC_IDN to id, possibly with second overriding the
first (or shadowing).

Try not doing 'as' part in select and then look for .id . Or keep the
'as' part as just have explicit field definition in the second one:


Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 6:29 AM, tom  wrote:
> Hi,
> I am trying a simple transformer on data input using DIH, Solr 4.6. when I
> run the below query while DIH I get null values for new_url. what is wrong?
> even tried with "${document_solr.id}"
>
> the name is
>
> data-config.xml:
>
>  transformer="TemplateTransformer,LogTransformer"
> query="select DOC_IDN as id, BILL_IDN as bill_id from document_solr"
> logTemplate="The name is ${document_solr.DOC_IDN}" logLevel="debug" >
>
> 
> 
>
> 
> 
>
>
>
> below stack trace:
> 8185946 [Thread-29] INFO  org.apache.solr.search.SolrIndexSearcher  û
> Opening Searcher@5a5f4cb7 realtime
> 8185960 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource
> û Creating a connection for entity document_solr with URL:
> jdbc:oracle:thin:@vluedb01:1521:iedwdev
> 8186225 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource
> û Time taken forgetConnection():265
> 8186226 [Thread-29] DEBUG org.apache.solr.handler.dataimport.JdbcDataSource
> û Executing SQL: select DOC_IDN as id, BILL_IDN as bill_id from
> document_solr
> 8186291 [Thread-29] TRACE org.apache.solr.handler.dataimport.JdbcDataSource
> û Time taken for sql :64
> 8186301 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
> û The name is
> 8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
> û The name is
> 8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer
> û The name is
>
>
> `Tom
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread Alexandre Rafalovitch

Well, it's your crawler that submits them, so the crawler should know
when to delete them.

If you want some sort of trigger from Solr, look at postCommit hook
defined in solrconfig.xml. Though all that gives you is timing, not
which documents to deal with.

You could probably also plug into UpdateRequestProcessor chain, where
you do have access to the document content.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Jan 31, 2014 at 3:40 AM, eShard  wrote:
> Hi,
> My crawler uploads all the documents to Solr for indexing to a tomcat/temp
> folder.
> Over time this folder grows so large that I run out of disk space.
> So, I wrote a bash script to delete the files and put it in the crontab.
> However, if I delete the docs too soon, it doesn't get indexed; too late and
> I run out of disk.
> I'm still trying to find the right window...
> So, (and this is probably a long shot)  I'm wondering if there's anything in
> Solr that can delete these docs from /temp after they've been indexed...
>
> Thank you,
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
> Sent from the Solr - User mailing list archive at Nabble.com.

TemplateTransformer returns null values

2014-01-30 Thread tom

Hi,
I am trying a simple transformer on data input using DIH, Solr 4.6. when I
run the below query while DIH I get null values for new_url. what is wrong?
even tried with "${document_solr.id}"

the name is 

data-config.xml:




   






below stack trace:
8185946 [Thread-29] INFO  org.apache.solr.search.SolrIndexSearcher  û
Opening Searcher@5a5f4cb7 realtime
8185960 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource 
û Creating a connection for entity document_solr with URL:
jdbc:oracle:thin:@vluedb01:1521:iedwdev
8186225 [Thread-29] INFO  org.apache.solr.handler.dataimport.JdbcDataSource 
û Time taken forgetConnection():265
8186226 [Thread-29] DEBUG org.apache.solr.handler.dataimport.JdbcDataSource 
û Executing SQL: select DOC_IDN as id, BILL_IDN as bill_id from
document_solr
8186291 [Thread-29] TRACE org.apache.solr.handler.dataimport.JdbcDataSource 
û Time taken for sql :64
8186301 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is
8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is
8186303 [Thread-29] DEBUG org.apache.solr.handler.dataimport.LogTransformer 
û The name is


`Tom




--
View this message in context: 
http://lucene.472066.n3.nabble.com/TemplateTransformer-returns-null-values-tp4114539.html
Sent from the Solr - User mailing list archive at Nabble.com.

JVM heap constraints and garbage collection

2014-01-30 Thread Joseph Hagerty

Greetings esteemed Solr-ites,

I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.

Since my average load during peak hours is becoming quite high, and since
I'm finally starting to notice a little bit of performance degradation and
intermittent errors (e.g. "Solr returned response 0" on perfectly valid
reads during load spikes), I think it's time to tune my Slave box before
things get out of control.

In particular, *I am curious how others are tuning their JVM heap
constraints (xms, xms, etc.) and garbage collection (parallel or
concurrent) to meet the needs of Solr*. I am using the Sun JVM Version 6,
not the fancy third party offerings.

Some more info, FWIW:

- Average document size in my index is probably around 6k
- Using CentOS
- Master-Slave setup. Master gets all the writes, Slave gets all the read
requests. It is the *Slave* that is suffering-- the Master seems fine.
- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
- DaemonThreads skyrocket during the aforementioned load spikes

Thanks for reading, and to the devs: thanks for an excellent product.

-- 
- Joe

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes

Found it. In case anyone else cares, this appears to be the root issue:
https://issues.apache.org/jira/browse/SOLR-5128

Thanks again.


On 1/30/14, 9:01 AM, "Jeff Wartes"  wrote:

>
>>Work is underway towards a new mode where zookeeper is the ultimate
>>source of truth, and each node will behave accordingly to implement and
>>maintain that truth.  I can't seem to locate a Jira issue for it,
>>unfortunately.  It's possible that one doesn't exist yet, or that it has
>>an obscure title.  Mark Miller is the one who really understands the
>>full details, as he's a primary author of SolrCloud code.
>>
>>Currently, what SolrCloud considers to be "truth" is dictated by both
>>zookeeper and an amalgamation of which cores each server actually has
>>present.  The collections API modifies both.  With an older config (all
>>current and future 4.x versions), the latter is in solr.xml.  If you're
>>using the new solr.xml format (available 4.4 and later, will be
>>mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
>>of everything and coordinates the cluster state, but has no real control
>>over the cores that actually exist on each server.  When the two sources
>>of truth disagree, nothing happens to fix the situation, manual
>>intervention is required.
>
>
>Thanks Shawn, this was exactly the confirmation I was looking for. I think
>I have a much better understanding now.
>
>The takeaway I have is that SolrCloud¹s current automation assumes
>relatively static clusters, and that if I want anything like dynamic
>scaling, I¹m going to have to write my own tooling to add nodes safely.
>
>Fortunately, it appears that the necessary CoreAdmin commands don¹t need
>much besides the collection name, so it smells like a simple thing to
>query zookeeper¹s /collections path (or clusterstate.json) and issue GET
>requests accordingly when I spin up a new node.
>
>If you (or anyone) does happen to recall a reference to the work you
>alluded to, I¹d certainly be interested. I googled around myself for a few
>minutes, but haven¹t found anything so far.
>
>

Is there a way to get Solr to delete an uploaded document after its been indexed?

2014-01-30 Thread eShard

Hi,
My crawler uploads all the documents to Solr for indexing to a tomcat/temp
folder.  
Over time this folder grows so large that I run out of disk space.  
So, I wrote a bash script to delete the files and put it in the crontab.
However, if I delete the docs too soon, it doesn't get indexed; too late and
I run out of disk.
I'm still trying to find the right window...
So, (and this is probably a long shot)  I'm wondering if there's anything in
Solr that can delete these docs from /temp after they've been indexed...

Thank you,




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-get-Solr-to-delete-an-uploaded-document-after-its-been-indexed-tp4114463.html
Sent from the Solr - User mailing list archive at Nabble.com.

Geospatial clustering + zoom in/out help

2014-01-30 Thread Bojan Šmid

Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.


* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are "real" geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get "equal" size squares in degrees would produce
issues with "real" square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan

Adding DocValues in an existing field

2014-01-30 Thread yriveiro

Hi,

Can I add to an existing field the docvalue feature without wipe the actual?

The modification on the schema will be something like this:



I want use the actual data to reindex it again in the same collection but in
the process create the docvalues too, it's possible?

I'm using solr 4.6.1



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-DocValues-in-an-existing-field-tp4114462.html
Sent from the Solr - User mailing list archive at Nabble.com.

RES: Regarding Solr Faceting on the query response.

2014-01-30 Thread Felipe Dantas de Souza Paiva

Hi Nilesh,

maybe Facetting is not the right thing for you, because 'faceting is the 
arrangement of search results into categories based on indexed terms' 
(https://cwiki.apache.org/confluence/display/solr/Faceting).

Perhaps you could use Result Clustering 
(https://cwiki.apache.org/confluence/display/solr/Result+Clustering), for   the 
clustering algorithm is applied to the search result of each single query.

Hope this helps.

Felipe Dantas de Souza Paiva

De: Kuchekar [kuchekar.nil...@gmail.com]
Enviado: quinta-feira, 30 de janeiro de 2014 15:35
Para: solr-user@lucene.apache.org
Assunto: Re: Regarding Solr Faceting on the query response.

Hi Mikhail,

 I would like my faceting to run only on my resultset
returned as in only on numFound, rather than the whole index.

In the example, even when I specify the query 'company:Apple' .. it gives
me faceted results for other companies. This means that it is querying
against the whole index, rather than just the result set.

Using facet.mincount=1 will give me faceted values which are greater than
1, but that will again to retrieve all the distinct values (Apple, Bose,
Chevron, ..Oracle..) of facet field (company) query the whole index.

What I would like to do is ... facet only on the resultset.

i.e. my query (q= company:Apple AND technologies:java ) should return, only
the facet details about 'Apple' since that is only present in the results
set. But it provides me the list of other Company Names ... which makes me
believe that it is querying the whole index to get the distinct value for
the company..

"docs": [ { "id": "ABC123", "company": [ "APPLE" ] },
{ "id": "ABC1234", "company": [ "APPLE" ] },
{ "id": "ABC1235", "company": [ "APPLE" ] },
{ "id": "ABC1236", "company": [ "APPLE" ] } ] }, "facet_counts": { "
facet_queries": { "p_company:ucsf\n": 1 }, "facet_fields": { "company": [
"APPLE", 4, ] }, "facet_dates": {}, "facet_ranges": {} }

 Thanks.
Kuchekar, Nilesh

On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello
> Do you mean setting
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
> you want to facet only returned page (rows) instead of full resultset
> (numFound) ?
>
>
> On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
> wrote:
>
> > Yeah it's a typo... I meant company:Apple
> >
> > Thanks
> > Nilesh
> >
> > > On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch  >
> > wrote:
> > >
> > >> On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar 
> > wrote:
> > >> company=Apple
> > > Did you mean company:Apple ?
> > >
> > > Otherwise, that could be the issue.
> > >
> > > Regards,
> > >   Alex.
> > >
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all
> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>

AVISO: A informação contida neste e-mail, bem como em qualquer de seus anexos, 
é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) acima 
referido(s), podendo conter informações sigilosas e/ou legalmente protegidas. 
Caso você não seja o destinatário desta mensagem, informamos que qualquer 
divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de seus anexos 
é absolutamente proibida. Solicitamos que o remetente seja comunicado 
imediatamente, respondendo esta mensagem, e que o original desta mensagem e de 
seus anexos, bem como toda e qualquer cópia e/ou impressão realizada a partir 
destes, sejam permanentemente apagados e/ou destruídos. Informações adicionais 
sobre nossa empresa podem ser obtidas no site http://sobre.uol.com.br/.

NOTICE: The information contained in this e-mail and any attachments thereto is 
CONFIDENTIAL and is intended only for use by the recipient named herein and may 
contain legally privileged and/or secret information.
If you are not the e-mail´s intended recipient, you are hereby notified that 
any dissemination, distribution or copy of this e-mail, and/or any attachments 
thereto, is strictly prohibited. Please immediately notify the sender replying 
to the above mentioned e-mail address, and permanently delete and/or destroy 
the original and any copy of this e-mail and/or its attachments, as well as any 
printout thereof. Additional information about our company may be obtained 
through the site http://www.uol.com.br/ir/.

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Kuchekar

Hi Mikhail,

 I would like my faceting to run only on my resultset
returned as in only on numFound, rather than the whole index.

In the example, even when I specify the query 'company:Apple' .. it gives
me faceted results for other companies. This means that it is querying
against the whole index, rather than just the result set.

Using facet.mincount=1 will give me faceted values which are greater than
1, but that will again to retrieve all the distinct values (Apple, Bose,
Chevron, ..Oracle..) of facet field (company) query the whole index.

What I would like to do is ... facet only on the resultset.

i.e. my query (q= company:Apple AND technologies:java ) should return, only
the facet details about 'Apple' since that is only present in the results
set. But it provides me the list of other Company Names ... which makes me
believe that it is querying the whole index to get the distinct value for
the company..

"docs": [ { "id": "ABC123", "company": [ "APPLE" ] },
{ "id": "ABC1234", "company": [ "APPLE" ] },
{ "id": "ABC1235", "company": [ "APPLE" ] },
{ "id": "ABC1236", "company": [ "APPLE" ] } ] }, "facet_counts": { "
facet_queries": { "p_company:ucsf\n": 1 }, "facet_fields": { "company": [
"APPLE", 4, ] }, "facet_dates": {}, "facet_ranges": {} }

 Thanks.
Kuchekar, Nilesh

On Thu, Jan 30, 2014 at 2:13 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Hello
> Do you mean setting
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
> you want to facet only returned page (rows) instead of full resultset
> (numFound) ?
>
>
> On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
> wrote:
>
> > Yeah it's a typo... I meant company:Apple
> >
> > Thanks
> > Nilesh
> >
> > > On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch  >
> > wrote:
> > >
> > >> On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar 
> > wrote:
> > >> company=Apple
> > > Did you mean company:Apple ?
> > >
> > > Otherwise, that could be the issue.
> > >
> > > Regards,
> > >   Alex.
> > >
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all
> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Jeff Wartes


>Work is underway towards a new mode where zookeeper is the ultimate
>source of truth, and each node will behave accordingly to implement and
>maintain that truth.  I can't seem to locate a Jira issue for it,
>unfortunately.  It's possible that one doesn't exist yet, or that it has
>an obscure title.  Mark Miller is the one who really understands the
>full details, as he's a primary author of SolrCloud code.
>
>Currently, what SolrCloud considers to be "truth" is dictated by both
>zookeeper and an amalgamation of which cores each server actually has
>present.  The collections API modifies both.  With an older config (all
>current and future 4.x versions), the latter is in solr.xml.  If you're
>using the new solr.xml format (available 4.4 and later, will be
>mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
>of everything and coordinates the cluster state, but has no real control
>over the cores that actually exist on each server.  When the two sources
>of truth disagree, nothing happens to fix the situation, manual
>intervention is required.


Thanks Shawn, this was exactly the confirmation I was looking for. I think
I have a much better understanding now.

The takeaway I have is that SolrCloud¹s current automation assumes
relatively static clusters, and that if I want anything like dynamic
scaling, I¹m going to have to write my own tooling to add nodes safely.

Fortunately, it appears that the necessary CoreAdmin commands don¹t need
much besides the collection name, so it smells like a simple thing to
query zookeeper¹s /collections path (or clusterstate.json) and issue GET
requests accordingly when I spin up a new node.

If you (or anyone) does happen to recall a reference to the work you
alluded to, I¹d certainly be interested. I googled around myself for a few
minutes, but haven¹t found anything so far.

Re: Required local configuration with ZK solr.xml?

2014-01-30 Thread Shawn Heisey

On 1/29/2014 12:48 PM, Jeff Wartes wrote:
> And that, I think, is my misunderstanding. I had assumed that the link
> between a node and the collections it belongs to would be the (possibly
> chroot¹ed) zookeeper reference *itself*, not the node¹s directory
> structure. Instead, it appears that ZK is simply a repository for the
> collection configuration, where nodes may look up what they need based on
> filesystem core references.

Work is underway towards a new mode where zookeeper is the ultimate
source of truth, and each node will behave accordingly to implement and
maintain that truth.  I can't seem to locate a Jira issue for it,
unfortunately.  It's possible that one doesn't exist yet, or that it has
an obscure title.  Mark Miller is the one who really understands the
full details, as he's a primary author of SolrCloud code.

Currently, what SolrCloud considers to be "truth" is dictated by both
zookeeper and an amalgamation of which cores each server actually has
present.  The collections API modifies both.  With an older config (all
current and future 4.x versions), the latter is in solr.xml.  If you're
using the new solr.xml format (available 4.4 and later, will be
mandatory in 5.0), it's done with Core Discovery.  Zookeeper has a list
of everything and coordinates the cluster state, but has no real control
over the cores that actually exist on each server.  When the two sources
of truth disagree, nothing happens to fix the situation, manual
intervention is required.

Any errors in my understanding of SolrCloud are my own.  I don't claim
that what I just wrote is error-free, but I am pretty sure that it's
essentially correct.

Thanks,
Shawn

Error when restarting solr servers

2014-01-30 Thread lansing

Hello,
Running solr cloud with 2 collections 5 shards and 3 replicas for each
collection, 5 zookeeper instance.
solr-4.6.0
apache-tomcat-7.0.39
zookeeper-3.4.5
jre1.7.0_21

When I try to restart a solr servers in my solr cloud I am receiving this
errors :

1861449 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â Running the leader
process for shard shard1
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â Checking if I should try
and be the leader.
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â My last published State
was down, I won't be the leader.
1861451 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.ShardLeaderElectionContext  â There may be a better
leader candidate than us - going back into recovery
1861452 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.update.DefaultSolrCoreState  â Running recovery - first
canceling any ongoing recovery
1861452 [localhost-startStop-1-EventThread] WARN 
org.apache.solr.cloud.RecoveryStrategy  â Stopping recovery for
zkNodeName=core_node3core=Current1_shard1_replica3
1862223 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Finished recovery process. core=Current1_shard1_replica3
1862223 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Starting recovery process.  core=Current1_shard1_replica3
recoveringAfterStartup=false
1862223 [RecoveryThread] ERROR org.apache.solr.update.UpdateLog  â Exception
reading versions from log
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
at sun.nio.ch.FileChannelImpl.read(Unknown Source)
at
org.apache.solr.update.ChannelFastInputStream.readWrappedStream(TransactionLog.java:778)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:71)
at
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at
org.apache.solr.update.TransactionLog$FSReverseReader.(TransactionLog.java:696)
at
org.apache.solr.update.TransactionLog.getReverseReader(TransactionLog.java:575)
at
org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:942)
at
org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:885)
at
org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1042)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:280)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)
1862223 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  â
Error while trying to recover.
core=Current1_shard1_replica3:org.apache.solr.common.SolrException: Cloud
state sti
ll says we are leader.
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:354)
at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)

1862224 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy  â
Recovery failed - trying again... (0) core=Current1_shard1_replica3
1862224 [RecoveryThread] INFO  org.apache.solr.cloud.RecoveryStrategy  â
Wait 2.0 seconds before trying to recover again (1)
1862541 [localhost-startStop-1-SendThread(10.0.5.230:2281)] WARN 
org.apache.zookeeper.ClientCnxn  â Session 0x542fd3f2be100e6 for server
10.0.5.230/10.0.5.230:2281, unexpected error, clo
sing socket connection and attempting reconnect
java.io.IOException: Packet len11106511 is out of range!
at
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None
1862641 [localhost-startStop-1-EventThread] INFO 
org.apache.solr.cloud.DistributedQueue  â Watcher fired on path: null state:
Disconnected type None


..


1270268 [http-bio-8201-exec-26] INFO 
org.apache.solr.handler.admin.CoreAdminHandler  â Going to wait for
coreNodeName: core_node10, state: recovering, checkLive: true, onlyIfLeader:
true
1270268 [http-bio-8201-exec-10] INFO 
org.apache.solr.handler.admin.CoreAdminHandler  â Going to wait for
coreNodeName: core_node11, state: recovering, checkLive

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Aleksander Akerø

I've come across something like this as well, can't remember where, but it
was often related to synonym functionality.

The following link shows a 3rd party QueryParser that seems to deal with
synonyms alongside edismax, and may be interesting to look at:
http://wiki.apache.org/solr/QueryParser

It is also mentioned as an issue while using the synonymFilterFactory:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
"The Lucene QueryParser tokenizes on white space before giving any text to
the Analyzer, so if a person searches for the words sea biscit the analyzer
will be given the words "sea" and "biscit" seperately, and will not know
that they match a synonym".

Maybe the extended support for synonym handling is what will give us the
solution one day. For now I have solved my problem and will leave it at
that.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Jack Krupansky :

> I vaguely recall that there was a Jira floating around for multi-word
> synonyms that dealt with parsing of spaces as well. And Robert Muir has
> (repeatedly) referred to this query parser feature as a "bug". Somehow,
> eventually, I think it will be dealt with, but the "difficulty" remains for
> now.
>
> -- Jack Krupansky
>
> -Original Message- From: Aleksander Akerø
> Sent: Thursday, January 30, 2014 9:31 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches
>
> Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
> confused if you don't have a good understanding of the differences between
> tokenizers and filters.
>
> As for the query parser problem, there's always a workaround, but it was
> nice to be made aware of. It sort of was a ghost-like problem before.
> Allthough it would be great to have the opportunity to "disable" the
> splitting on whitespace even for DisMax, I understand that it probably not
> the most wanted feature for next solr release :)
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: aleksan...@gurusoft.no
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>
>
> 2014-01-30 Erick Erickson :
>
>  Note, the comments about lowercasetokenizer were a red herring. You were
>> using LowerCaseFilterFactory. note "Filter" rather than "Tokenizer". So it
>> would
>> just do what you expected, lowercase the entire input. You would have used
>> LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
>> Filter.
>>
>> As for the rest, I expect Jack is right, it's the query parsing above
>> the field input.
>>
>> Best
>> Erick
>>
>> On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
>>  wrote:
>> > Hi Srinivasa
>> >
>> > Yes I've come to understand that the analyzers will never "see" the
>> > whitespace, thus no need for patternreplacement, like Jack points out.
>> > So
>> > the solution would be to set wich parser to use for the query. Also Jack
>> > has pointed out that the "field" queryparser should work in this
>> particular
>> > setting -> http://wiki.apache.org/solr/QueryParser
>> >
>> > My problem was though, that it was only for one of the fields in the
>> schema
>> > that i needed this for, but for all the other fields, e.g. name,
>> > description etc., I would very much like to make use of the eDisMax
>> > functionality. And it seems that there can only be defined one query
>> parser
>> > per query. in other words: for all fields. Jack, you may correct me if
>> I'm
>> > wrong here :)
>> >
>> > This particular customer wanted a wildcard search at both ends of the
>> > phrase, and that sort of ambiguated the problem. And therefore I chose
>> > to
>> > replace all whitespace for this field in sql at index time, using the
>> DIH.
>> > And then using EdgeNGramFilterFactory on both sides of the keyword like
>> the
>> > config below, and that seemed to work pretty nicely.
>> >
>> >  > class=
>> > "solr.TextField" positionIncrementGap="100">  <
>> > tokenizer class="solr.KeywordTokenizerFactory"/> > > "solr.LowerCaseFilterFactory"/> > class="solr.EdgeNGramFilterFactory"
>> > minGramSize="2" maxGramSize="25" side="front"/> > > "solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25"
>> side="back"/>
>> >   > > "solr.KeywordTokenizerFactory"/> > class="solr.LowerCaseFilterFactory"
>> > />  
>> >
>> > I also added a bit of extra weighting for the "keyword" field so that
>> exact
>> > matches recieved a higher score.
>> >
>> > What this solution doesn't do is to exclude values like "EE 009", when
>> > searching for "FE 009", but they return far down on the list, which for
>> the
>> > customer is ok, because usually these results are somewhat related og
>> > within the same category.
>> >
>> > *Aleksander Akerø*
>> > Systemkonsulent
>> > Mobil: 944 89 054
>> > E-post: aleksan...@gurusoft.no
>> >
>> > *Gurusoft AS*
>> > Telefon: 9

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen


Hi

I used Ivy 2.2.0. Upgraded to 2.3.0. Didnt help
No lck files found in ~/.ivy2/cache, so nothing to delete
Deleted the entire ~/.ivy2/cache folder. Didnt help
Debugged a little and found that it was hanging due to org.apache.hadoop 
dependencies in solr/core/ivy.xml - if I commended out everything that 
had to do with hadoop in that ivy.xml it didnt hang in "ant resolve" 
(from solr/core)
Finally the problem was solved when I tried to add 
http://central.maven.org/maven2 to our Artifactory. Do not understand 
why that was necessary, because we already had 
http://repo1.maven.org/maven2/ in our Artifactory.


Well never mind - it works for me now.

Thanks for the help!

Regards, Per Steffensen

On 1/30/14 1:11 PM, Steve Rowe wrote:

Hi Per,

You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 
upgraded the bootstrapped Ivy to 2.3.0 to reduce the likelihood of this 
problem, so the first thing is to make sure you have that version in 
~/.ant/lib/ - if not, remove the Ivy jar that’s there and run ‘ant 
ivy-bootstrap’ to download and put the 2.3.0 jar in place.

You should run the following and remove any files it finds:

 find ~/.ivy2/cache -name ‘*.lck’

That should stop ‘ant resolve’ from hanging.

Steve
  
On Jan 30, 2014, at 5:06 AM, Per Steffensen  wrote:



Hi

Earlier in used to be able to successfully run "ant eclipse" from branch_4x. With the 
newest code (tip of branch_4x today) I cant. "ant eclipse" hangs forever at the point 
showed by console output below. I noticed that this problem has been around for a while - not 
something that happened today. Any idea about what might be wrong? A solution? Help to debug?

Regards Per Steffensen

--- console when running "ant eclipse" -

...

resolve:
 [echo] Building solr-example-DIH...

ivy-availability-check:
 [echo] Building solr-example-DIH...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml

resolve:

resolve:
 [echo] Building solr-core...

ivy-availability-check:
 [echo] Building solr-core...

ivy-fail:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml

resolve:

HERE IT JUST HANGS FOREVER
-

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Jack Krupansky

I vaguely recall that there was a Jira floating around for multi-word 
synonyms that dealt with parsing of spaces as well. And Robert Muir has 
(repeatedly) referred to this query parser feature as a "bug". Somehow, 
eventually, I think it will be dealt with, but the "difficulty" remains for 
now.


-- Jack Krupansky

-Original Message- 
From: Aleksander Akerø

Sent: Thursday, January 30, 2014 9:31 AM
To: solr-user@lucene.apache.org
Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches

Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
confused if you don't have a good understanding of the differences between
tokenizers and filters.

As for the query parser problem, there's always a workaround, but it was
nice to be made aware of. It sort of was a ghost-like problem before.
Allthough it would be great to have the opportunity to "disable" the
splitting on whitespace even for DisMax, I understand that it probably not
the most wanted feature for next solr release :)

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Erick Erickson :


Note, the comments about lowercasetokenizer were a red herring. You were
using LowerCaseFilterFactory. note "Filter" rather than "Tokenizer". So it
would
just do what you expected, lowercase the entire input. You would have used
LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
Filter.

As for the rest, I expect Jack is right, it's the query parsing above
the field input.

Best
Erick

On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
 wrote:
> Hi Srinivasa
>
> Yes I've come to understand that the analyzers will never "see" the
> whitespace, thus no need for patternreplacement, like Jack points out. 
> So

> the solution would be to set wich parser to use for the query. Also Jack
> has pointed out that the "field" queryparser should work in this
particular
> setting -> http://wiki.apache.org/solr/QueryParser
>
> My problem was though, that it was only for one of the fields in the
schema
> that i needed this for, but for all the other fields, e.g. name,
> description etc., I would very much like to make use of the eDisMax
> functionality. And it seems that there can only be defined one query
parser
> per query. in other words: for all fields. Jack, you may correct me if
I'm
> wrong here :)
>
> This particular customer wanted a wildcard search at both ends of the
> phrase, and that sort of ambiguated the problem. And therefore I chose 
> to

> replace all whitespace for this field in sql at index time, using the
DIH.
> And then using EdgeNGramFilterFactory on both sides of the keyword like
the
> config below, and that seemed to work pretty nicely.
>
>   "solr.TextField" positionIncrementGap="100">  <
> tokenizer class="solr.KeywordTokenizerFactory"/>  "solr.LowerCaseFilterFactory"/>  minGramSize="2" maxGramSize="25" side="front"/>  "solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25"
side="back"/>
>"solr.KeywordTokenizerFactory"/>  />  
>
> I also added a bit of extra weighting for the "keyword" field so that
exact
> matches recieved a higher score.
>
> What this solution doesn't do is to exclude values like "EE 009", when
> searching for "FE 009", but they return far down on the list, which for
the
> customer is ok, because usually these results are somewhat related og
> within the same category.
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: aleksan...@gurusoft.no
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>
>
> 2014-01-30 Jack Krupansky 
>
>> The standard, keyword-oriented query parsers will all treat unquoted,
>> unescaped white space as term delimiters and ignore the what space.
There
>> is no way to bypass that behavior. So, your regex will never even see
the
>> white space - unless you enclose the text and white space in quotes or
use
>> a backslash to quote each white space character.
>>
>> You can use the "field" and "term" query parsers to pass a query string
as
>> if it were fully enclosed in quotes, but that only handles a single 
>> term

>> and does not allow for multiple terms or any query operators. For
example:
>>
>> {!field f=myfield}Foo Bar
>>
>> See:
>> http://wiki.apache.org/solr/QueryParser
>>
>> You can also pre-configure the field query parser with the 
>> defType=field

>> parameter.
>>
>> -- Jack Krupansky
>>
>>
>> -Original Message- From: Srinivasa7
>> Sent: Thursday, January 30, 2014 6:37 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches
>>
>> Hi,
>>
>> I  have similar kind of problem  where I want search for a words with
>> spaces
>> in that. And I wanted to search by stripping all the spaces .
>>
>> I have used following schema for that
>>
>> > autoGeneratePhraseQueries="true"  >
>>
>>  
>>
>>

SolR performance problem

2014-01-30 Thread MayurPanchal

Hi, 

I am working on solr 4.2.1 jetty and we are facing some performance issue
and heap memory overflow issue as well. So i am searching the actual cause
for this exceptions. then i applied load test for different solr queries.
After few mins got below errors. 

WARN:oejs.Response:Committed before 500 {msg=Software caused connection
abort: socket write 

Caused by: java.net.SocketException: Software caused connection abort:
socket write error

SEVERE: null:org.eclipse.jetty.io.EofException


I also tried to set the maxIdleTime to 30 milliSeconds. But still
getting same error. 

Any ideas? 
Please help, how to tackle this. 

Thanks,
Mayur



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolR-performance-problem-tp4114459.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Aleksander Akerø

Yes, I actually noted that about the filter vs. tokenizer. It's easy to get
confused if you don't have a good understanding of the differences between
tokenizers and filters.

As for the query parser problem, there's always a workaround, but it was
nice to be made aware of. It sort of was a ghost-like problem before.
Allthough it would be great to have the opportunity to "disable" the
splitting on whitespace even for DisMax, I understand that it probably not
the most wanted feature for next solr release :)

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Erick Erickson :

> Note, the comments about lowercasetokenizer were a red herring. You were
> using LowerCaseFilterFactory. note "Filter" rather than "Tokenizer". So it
> would
> just do what you expected, lowercase the entire input. You would have used
> LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a
> Filter.
>
> As for the rest, I expect Jack is right, it's the query parsing above
> the field input.
>
> Best
> Erick
>
> On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
>  wrote:
> > Hi Srinivasa
> >
> > Yes I've come to understand that the analyzers will never "see" the
> > whitespace, thus no need for patternreplacement, like Jack points out. So
> > the solution would be to set wich parser to use for the query. Also Jack
> > has pointed out that the "field" queryparser should work in this
> particular
> > setting -> http://wiki.apache.org/solr/QueryParser
> >
> > My problem was though, that it was only for one of the fields in the
> schema
> > that i needed this for, but for all the other fields, e.g. name,
> > description etc., I would very much like to make use of the eDisMax
> > functionality. And it seems that there can only be defined one query
> parser
> > per query. in other words: for all fields. Jack, you may correct me if
> I'm
> > wrong here :)
> >
> > This particular customer wanted a wildcard search at both ends of the
> > phrase, and that sort of ambiguated the problem. And therefore I chose to
> > replace all whitespace for this field in sql at index time, using the
> DIH.
> > And then using EdgeNGramFilterFactory on both sides of the keyword like
> the
> > config below, and that seemed to work pretty nicely.
> >
> >   class=
> > "solr.TextField" positionIncrementGap="100">  <
> > tokenizer class="solr.KeywordTokenizerFactory"/>  > "solr.LowerCaseFilterFactory"/>  class="solr.EdgeNGramFilterFactory"
> > minGramSize="2" maxGramSize="25" side="front"/>  > "solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25"
> side="back"/>
> >> "solr.KeywordTokenizerFactory"/>  class="solr.LowerCaseFilterFactory"
> > />  
> >
> > I also added a bit of extra weighting for the "keyword" field so that
> exact
> > matches recieved a higher score.
> >
> > What this solution doesn't do is to exclude values like "EE 009", when
> > searching for "FE 009", but they return far down on the list, which for
> the
> > customer is ok, because usually these results are somewhat related og
> > within the same category.
> >
> > *Aleksander Akerø*
> > Systemkonsulent
> > Mobil: 944 89 054
> > E-post: aleksan...@gurusoft.no
> >
> > *Gurusoft AS*
> > Telefon: 92 44 09 99
> > Østre Kullerød
> > www.gurusoft.no
> >
> >
> > 2014-01-30 Jack Krupansky 
> >
> >> The standard, keyword-oriented query parsers will all treat unquoted,
> >> unescaped white space as term delimiters and ignore the what space.
> There
> >> is no way to bypass that behavior. So, your regex will never even see
> the
> >> white space - unless you enclose the text and white space in quotes or
> use
> >> a backslash to quote each white space character.
> >>
> >> You can use the "field" and "term" query parsers to pass a query string
> as
> >> if it were fully enclosed in quotes, but that only handles a single term
> >> and does not allow for multiple terms or any query operators. For
> example:
> >>
> >> {!field f=myfield}Foo Bar
> >>
> >> See:
> >> http://wiki.apache.org/solr/QueryParser
> >>
> >> You can also pre-configure the field query parser with the defType=field
> >> parameter.
> >>
> >> -- Jack Krupansky
> >>
> >>
> >> -Original Message- From: Srinivasa7
> >> Sent: Thursday, January 30, 2014 6:37 AM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches
> >>
> >> Hi,
> >>
> >> I  have similar kind of problem  where I want search for a words with
> >> spaces
> >> in that. And I wanted to search by stripping all the spaces .
> >>
> >> I have used following schema for that
> >>
> >>  >> autoGeneratePhraseQueries="true"  >
> >>
> >>  
> >>
> >> >> pattern="[^\w]+"  replacement="" replace="all"/>
> >>
> >>
> >>
> >>
> >>
> >> >> pattern="[^\w]+"  replacement="" r

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Erick Erickson

Note, the comments about lowercasetokenizer were a red herring. You were
using LowerCaseFilterFactory. note "Filter" rather than "Tokenizer". So it would
just do what you expected, lowercase the entire input. You would have used
LowerCaseTokenizerFactory in place of KeywordTokenizerFactory, not as a Filter.

As for the rest, I expect Jack is right, it's the query parsing above
the field input.

Best
Erick

On Thu, Jan 30, 2014 at 6:29 AM, Aleksander Akerø
 wrote:
> Hi Srinivasa
>
> Yes I've come to understand that the analyzers will never "see" the
> whitespace, thus no need for patternreplacement, like Jack points out. So
> the solution would be to set wich parser to use for the query. Also Jack
> has pointed out that the "field" queryparser should work in this particular
> setting -> http://wiki.apache.org/solr/QueryParser
>
> My problem was though, that it was only for one of the fields in the schema
> that i needed this for, but for all the other fields, e.g. name,
> description etc., I would very much like to make use of the eDisMax
> functionality. And it seems that there can only be defined one query parser
> per query. in other words: for all fields. Jack, you may correct me if I'm
> wrong here :)
>
> This particular customer wanted a wildcard search at both ends of the
> phrase, and that sort of ambiguated the problem. And therefore I chose to
> replace all whitespace for this field in sql at index time, using the DIH.
> And then using EdgeNGramFilterFactory on both sides of the keyword like the
> config below, and that seemed to work pretty nicely.
>
>   "solr.TextField" positionIncrementGap="100">  <
> tokenizer class="solr.KeywordTokenizerFactory"/>  "solr.LowerCaseFilterFactory"/>  minGramSize="2" maxGramSize="25" side="front"/>  "solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="25" side="back"/>
>"solr.KeywordTokenizerFactory"/>  />  
>
> I also added a bit of extra weighting for the "keyword" field so that exact
> matches recieved a higher score.
>
> What this solution doesn't do is to exclude values like "EE 009", when
> searching for "FE 009", but they return far down on the list, which for the
> customer is ok, because usually these results are somewhat related og
> within the same category.
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: aleksan...@gurusoft.no
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>
>
> 2014-01-30 Jack Krupansky 
>
>> The standard, keyword-oriented query parsers will all treat unquoted,
>> unescaped white space as term delimiters and ignore the what space. There
>> is no way to bypass that behavior. So, your regex will never even see the
>> white space - unless you enclose the text and white space in quotes or use
>> a backslash to quote each white space character.
>>
>> You can use the "field" and "term" query parsers to pass a query string as
>> if it were fully enclosed in quotes, but that only handles a single term
>> and does not allow for multiple terms or any query operators. For example:
>>
>> {!field f=myfield}Foo Bar
>>
>> See:
>> http://wiki.apache.org/solr/QueryParser
>>
>> You can also pre-configure the field query parser with the defType=field
>> parameter.
>>
>> -- Jack Krupansky
>>
>>
>> -Original Message- From: Srinivasa7
>> Sent: Thursday, January 30, 2014 6:37 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches
>>
>> Hi,
>>
>> I  have similar kind of problem  where I want search for a words with
>> spaces
>> in that. And I wanted to search by stripping all the spaces .
>>
>> I have used following schema for that
>>
>> > autoGeneratePhraseQueries="true"  >
>>
>>  
>>
>>> pattern="[^\w]+"  replacement="" replace="all"/>
>>
>>
>>
>>
>>
>>> pattern="[^\w]+"  replacement="" replace="all"/>
>>
>>
>>
>>
>> And
>>
>>
>> > omitNorms="true" />
>>
>>
>>
>>
>> But it is not searching the right terms . we are stripping the spaces and
>> indexing lowercase values when we do that.
>>
>>
>> Like : East Enders
>>
>> when I seach for   'east end ers'  text, its not returning any values
>> saying
>> no document found.
>>
>> I realised the solr uses QueryParser before passing query string to the
>> QueryAnalyzer in defined in schema.
>>
>> And The Query parser is tokenizing the query string providing in query . So
>> it is sending each token to the QueryAnalyser that is defined in schema.
>>
>>
>> SO is there anyway that I can by pass this query parser or use a correct
>> query processor which can consider the entire string as single pharse.
>>
>> At the moment I am using dismax query processor.
>>
>> Any suggestion would be much appreciated.
>>
>> Thanks
>> Srinivasa
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/
>> KeywordTokenizerFactory-tr

Re: 4.6 Core Discovery coreRootDirectory not working

2014-01-30 Thread Erick Erickson

I'm traveling and can't pursue this right now, but a couple of questions:

/home/user1/solr/core.properties exists in all these cases, right?

Tangential, but I'd be very cautious about setting core root the way you are,
since it'll walk each and every directory under /home looking for cores. Perhaps
you're just caught in that file-traversal loop (guessing here).

Do the log files show anything interesting?

I'll be able to respond occasionally between now and next week, since
we're on the road...

Best
Erick

On Wed, Jan 29, 2014 at 3:41 PM, Sam Batschelet  wrote:
> On Jan 29, 2014, at 4:31 PM, Sam Batschelet wrote:
>
>> Hello this is my 1st post to you group I am in the process of setting up a 
>> development environment using solr.  We will require multiple cores managed 
>> by multiple users in the following layout.  I am running a fairly vanilla 
>> version of 4.6
>>
>> 
>> /home/camp/example/solr/solr.xml
>>
>> 
>> /home/user1/solr/core.properties
>> /home/user2/solr/core.properties
>>
>> If I manually add the core from admin everything works fine I can index etc 
>> but when I kill the server the core information is no longer available.  I 
>> need to delete the core.properties file and recreate core from admin.
>>
>> I since have learned that this should be done with Core Discovery.  Mainly 
>> setting coreRootDirectory which logically in this case should be /home.  But 
>> solr is not finding the core even if I set the directory directly. ie 
>> /home/user1/solr/ or /home/user1/.  I must be missing another config and was 
>> hoping for some insight.
>>
>>
>> ## solr.xml
>> 
>>  
>
> Just to point out the obvious before I get 20 responses to such I did test 
> this without the commenting :).

Re: high memory usage with small data set

2014-01-30 Thread Erick Erickson

Do your used entries in your caches increase in parallel? This would be the case
if you aren't updating your index and would explain it. BTW, take a look at your
cache statistics (from the admin page) and look at the cache hit ratios. If they
are very small (and my guess is that with 1,500 boolean operations, you aren't
getting significant re-use) then you're just wasting space, try the cache=false
option.

Also, how are you measuring memory? It's sometimes confusing that virtual
memory can be include, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert
 wrote:
> Hi,
>
> we are using Apache Solr Cloud within a production environment. If the
> maximum heap-space is reached the Solr access time slows down, because of
> the working garbage collector for a small amount of time.
>
> We use the following configuration:
>
> - Apache Tomcat as webserver to run the Solr web application
> - 13 indices with about 150 entries (300 MB)
> - 5 server with one replication per index (5 GB max heap-space)
> - All indices have the following caches
>- maximum document-cache-size is 4096 entries, all other indices have
> between 64 and 1536 entries
>- maximum query-cache-size is 1024 entries, all other indices have
> between 64 and 768
>- maximum filter-cache-size is 1536 entries, all other i ndices have
> between 64 and 1024
> - the directory-factory-implementation is NRTCachingDirectoryFactory
> - the index is updated once per hour (no auto commit)
> - ca. 5000 requests per hour per server
> - large filter-queries (up to 15000 bytes and 1500 boolean operations)
> - many facet-queries (30%)
>
> Behaviour:
>
> Started with 512 MB heap space. Over several days the heap-space grow up,
> until the 5 GB was reached. At this moment the described problem occurs.
> From this time on the heap-space-useage is between 50 and 90 percent. No
> OutOfMemoryException occurs.
>
> Questions:
>
>
> 1. Why does Solr use 5 GB ram, with this small amount of data?
> 2. Which impact does the large filter-queries have in relation to ram usage?
>
> Thanks!
>
> Johannes Siegert

Re: Solr middle-ware?

2014-01-30 Thread Furkan KAMACI

Hi;

If you need such kind of thing and if you/we can define the requirements I
can contribute to Solr as a part of GSOC.

Thanks;
Furkan KAMACI



2014-01-30 Jack Krupansky :

> It would be great if an example were available as part of the Solr
> release. Please file a Jira request. Maybe this could be one of the GSOC
> (Google Summer of Code) projects, or maybe somebody/everybody could submit
> their search middleware code as possible examples, attached to the Jira, so
> that even if these examples are not formally released, at least people can
> view and copy them.
>
> -- Jack Krupansky
>
> -Original Message- From: Alexandre Rafalovitch
> Sent: Tuesday, January 21, 2014 8:00 AM
>
> To: solr-user@lucene.apache.org
> Subject: Solr middle-ware?
>
> Hello,
>
> All the Solr documents talk about not running Solr directly to the
> cloud. But I see people keep asking for a thin secure layer in front
> of Solr they can talk from JavaScript to, perhaps with some basic
> extension options.
>
> Has anybody actually written one? Open source or in a community part
> of larger project? I would love to be able to point people at
> something.
>
> Is there something particularly difficult about writing one? Does
> anybody has a story of aborted attempt or mid-point reversal? I would
> like to know.
>
> Regards,
>   Alex.
> P.s. Personal context: I am thinking of doing a series of lightweight
> examples of how to use Solr. Like I did for a book, but with a bit
> more depth and something that can actually be exposed to the live web
> with live data. I don't want to reinvent the wheel of the thin Solr
> middleware.
> P.p.s. Though I keep thinking that Dart could make an interesting
> option for the middleware as it could have the same codebase on the
> server and in the client. Like NodeJS, but with saner syntax.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>

Re: Regarding Solr Faceting on the query response.

2014-01-30 Thread Alexei Martchenko

I believe its not possible to facet only the page you are, facet is
supposed to work only with the full resultset. I never tried but i've never
seen a way this could be done.


alexei martchenko
Facebook  |
Linkedin|
Steam  |
4sq| Skype: alexeiramone |
Github  | (11) 9 7613.0966 |


2014-01-30 Mikhail Khludnev :

> Hello
> Do you mean setting
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount to 1 or
> you want to facet only returned page (rows) instead of full resultset
> (numFound) ?
>
>
> On Thu, Jan 30, 2014 at 6:24 AM, Nilesh Kuchekar
> wrote:
>
> > Yeah it's a typo... I meant company:Apple
> >
> > Thanks
> > Nilesh
> >
> > > On Jan 29, 2014, at 8:59 PM, Alexandre Rafalovitch  >
> > wrote:
> > >
> > >> On Thu, Jan 30, 2014 at 3:43 AM, Kuchekar 
> > wrote:
> > >> company=Apple
> > > Did you mean company:Apple ?
> > >
> > > Otherwise, that could be the issue.
> > >
> > > Regards,
> > >   Alex.
> > >
> > >
> > > Personal website: http://www.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all
> > > at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> > > book)
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
>  
>

Re: Solr middle-ware?

2014-01-30 Thread Jack Krupansky

It would be great if an example were available as part of the Solr release. 
Please file a Jira request. Maybe this could be one of the GSOC (Google 
Summer of Code) projects, or maybe somebody/everybody could submit their 
search middleware code as possible examples, attached to the Jira, so that 
even if these examples are not formally released, at least people can view 
and copy them.


-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Tuesday, January 21, 2014 8:00 AM
To: solr-user@lucene.apache.org
Subject: Solr middle-ware?

Hello,

All the Solr documents talk about not running Solr directly to the
cloud. But I see people keep asking for a thin secure layer in front
of Solr they can talk from JavaScript to, perhaps with some basic
extension options.

Has anybody actually written one? Open source or in a community part
of larger project? I would love to be able to point people at
something.

Is there something particularly difficult about writing one? Does
anybody has a story of aborted attempt or mid-point reversal? I would
like to know.

Regards,
  Alex.
P.s. Personal context: I am thinking of doing a series of lightweight
examples of how to use Solr. Like I did for a book, but with a bit
more depth and something that can actually be exposed to the live web
with live data. I don't want to reinvent the wheel of the thin Solr
middleware.
P.p.s. Though I keep thinking that Dart could make an interesting
option for the middleware as it could have the same codebase on the
server and in the client. Like NodeJS, but with saner syntax.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

SOLR suggester with highlighting

2014-01-30 Thread Jorge Sanchez

Hello,

I am trying to make a typehead autocomplete with SOLR using the suggester.

The search will be done for users and group names which aggregate users.
The search will be done on usernames , bio , web page and other stuff. What
I want to achieve is sort of "facebook" or "twitter" alike search. For this
I need to enrich the result from SOLR with additional data (user type, url
of the profile, his avatar url etc.).

The user and group would have the ID field in SOLR which would correspond
to the ID in the DB to get these information. I am stuck on how to do that.

Currently I have the suggester working but it only returns the suggesting
value, when I try to return some other attribute from the document it
doesn't work.

Here is the part of the solrconfig:


  
  
  suggester_dictionary
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.FSTLookupFactory
  
  autocomplete
  
  true
  
  



  
  
  suggester_dictionary
  5
  false
  
  
  
  true
  
  false
  
  
  
  suggest
  highlight
  
  

and scheme:


   
   
   






The query: http://gruppu.com:8983/solr/suggest?q=*:*
&spellcheck.q=jo&spellcheck=true&hl=on&hl.fl=groupid

The respond:



0
0




2
0
2

jorge
jorgen






I would like to have the groupid and grouporuser fields returned ... No
luck so far.

Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-30 Thread Jack Krupansky

Lucene's default scoring should give you much of what you want - ranking 
hits of low-frequency terms higher - without any special query syntax - just 
list out your terms and use "OR" as your default operator.


-- Jack Krupansky

-Original Message- 
From: svante karlsson

Sent: Thursday, January 23, 2014 6:42 AM
To: solr-user@lucene.apache.org
Subject: how to write an efficient query with a subquery to restrict the 
search space?


I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)&rows=100&fl=*

but what I think I get is
.  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)&rows=100&fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante

Re: Not finding part of fulltext field when word ends in dot

2014-01-30 Thread Jack Krupansky

The word delimiter filter will turn 26KA into two tokens, as if you had 
written "26 KA" without the quotes. The autoGeneratePhraseQueries option 
will cause the multiple terms to be treated as if they actually were 
enclosed within quotes, otherwise they will be treated as separate and 
unquoted terms. If you do enclose "26KA" in quotes in your query then 
autoGeneratePhraseQueries is not relevant.


Ah... maybe the problem is that you have preserveOriginal="true" in your 
query analyzer. Do you have your default query operator set to "AND"? If so, 
it would treat "26KA" as "26" AND "KA" AND "26KA", which requires that 
"26KA" (without the trailing dot) to be in the index.


It seems counter-intuitive, but the attributes of the index and query word 
delimiter filters need to be slightly asymmetric.


-- Jack Krupansky

-Original Message- 
From: Thomas Michael Engelke

Sent: Thursday, January 30, 2014 2:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Not finding part of fulltext field when word ends in dot

I'm not sure I got my problem across. If I understand the snippet of
documentation right, autoGeneratePhraseQueries only affects queries that
result in multiple tokens, which mine does not. The version also is
3.6.0.1, and we're not planning on upgrading to any 4.x version.


2014-01-29 Jack Krupansky 


You might want to add autoGeneratePhraseQueries="true" to your field
type, but I don't think that would cause a break when going from 3.6 to
4.x. The default for that attribute changed in Solr 3.5. What release was
your data indexed using? There may have been some subtle word delimiter
filter changes between 3.x and 4.x.

Read:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%
3CC0551C512C863540BC59694A118452AA0764A434@ITS-EMBX-03.
adsroot.itcs.umich.edu%3E



-Original Message- From: Thomas Michael Engelke
Sent: Wednesday, January 29, 2014 11:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Not finding part of fulltext field when word ends in dot


The fieldType definition is a tad on the longer side:

   
   
   

   

   
   
   

   
   
   
   
   
   

   
   
   
   
   
   
   


Thank you for taking a look.


2014-01-29 Jack Krupansky 

 What field type and analyzer/tokenizer are you using?


-- Jack Krupansky

-Original Message- From: Thomas Michael Engelke Sent: Wednesday,
January 29, 2014 10:45 AM To: solr-user@lucene.apache.org Subject: Not
finding part of fulltext field when word ends in dot
Hello everybody,

we have a legacy solr installation in version 3.6.0.1. One of the indices
defines a field named "content" as a fulltext field where a product
description will reside. One of the records indexed contains the 
following

data (excerpt):

z. B. in der Serie 26KA.

I had the problem that searching the value "26KA" didn't find anything.
Using the analyzer of the adminstrative interface and using the full text
on one hand and "26KA" as the query string, I can see how the search
string
is transformed by the used filter factories. The
WordDelimiterFilterFactory
transforms the "26KA." into "26KA", which is displayed like this
(excerpt):

73 74  7576
in der Serie 26KA.
26KA

It seems that it stripped the "26KA." of the dot. Using the option to
highlight matches, an analysis search of "26KA" shows the lower of the 
two

entries matches (after reaching the LowerCaseFilterFactory). However,
querying the index using the query interface doesn't show any matches.

I discovered that adding an asterisk to the search seems to work, as does
adding the dot. I am puzzled by this, as I thought that the second added
entry was the word actually indexed. I've tried looking up the definition
of the administrative interface, but the documentation only specifies 
this

for the latest version, where the display is different and (at least in
the
sample) doesn't show such "duplication".

Can anybody shed some light onto this?

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Aleksander Akerø

Hi Srinivasa

Yes I've come to understand that the analyzers will never "see" the
whitespace, thus no need for patternreplacement, like Jack points out. So
the solution would be to set wich parser to use for the query. Also Jack
has pointed out that the "field" queryparser should work in this particular
setting -> http://wiki.apache.org/solr/QueryParser

My problem was though, that it was only for one of the fields in the schema
that i needed this for, but for all the other fields, e.g. name,
description etc., I would very much like to make use of the eDisMax
functionality. And it seems that there can only be defined one query parser
per query. in other words: for all fields. Jack, you may correct me if I'm
wrong here :)

This particular customer wanted a wildcard search at both ends of the
phrase, and that sort of ambiguated the problem. And therefore I chose to
replace all whitespace for this field in sql at index time, using the DIH.
And then using EdgeNGramFilterFactory on both sides of the keyword like the
config below, and that seemed to work pretty nicely.

   <
tokenizer class="solr.KeywordTokenizerFactory"/>   
 

I also added a bit of extra weighting for the "keyword" field so that exact
matches recieved a higher score.

What this solution doesn't do is to exclude values like "EE 009", when
searching for "FE 009", but they return far down on the list, which for the
customer is ok, because usually these results are somewhat related og
within the same category.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-30 Jack Krupansky 

> The standard, keyword-oriented query parsers will all treat unquoted,
> unescaped white space as term delimiters and ignore the what space. There
> is no way to bypass that behavior. So, your regex will never even see the
> white space - unless you enclose the text and white space in quotes or use
> a backslash to quote each white space character.
>
> You can use the "field" and "term" query parsers to pass a query string as
> if it were fully enclosed in quotes, but that only handles a single term
> and does not allow for multiple terms or any query operators. For example:
>
> {!field f=myfield}Foo Bar
>
> See:
> http://wiki.apache.org/solr/QueryParser
>
> You can also pre-configure the field query parser with the defType=field
> parameter.
>
> -- Jack Krupansky
>
>
> -Original Message- From: Srinivasa7
> Sent: Thursday, January 30, 2014 6:37 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches
>
> Hi,
>
> I  have similar kind of problem  where I want search for a words with
> spaces
> in that. And I wanted to search by stripping all the spaces .
>
> I have used following schema for that
>
>  autoGeneratePhraseQueries="true"  >
>
>  
>
> pattern="[^\w]+"  replacement="" replace="all"/>
>
>
>
>
>
> pattern="[^\w]+"  replacement="" replace="all"/>
>
>
>
>
> And
>
>
>  omitNorms="true" />
>
>
>
>
> But it is not searching the right terms . we are stripping the spaces and
> indexing lowercase values when we do that.
>
>
> Like : East Enders
>
> when I seach for   'east end ers'  text, its not returning any values
> saying
> no document found.
>
> I realised the solr uses QueryParser before passing query string to the
> QueryAnalyzer in defined in schema.
>
> And The Query parser is tokenizing the query string providing in query . So
> it is sending each token to the QueryAnalyser that is defined in schema.
>
>
> SO is there anyway that I can by pass this query parser or use a correct
> query processor which can consider the entire string as single pharse.
>
> At the moment I am using dismax query processor.
>
> Any suggestion would be much appreciated.
>
> Thanks
> Srinivasa
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/
> KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: ant eclipse hangs - branch_4x

2014-01-30 Thread Steve Rowe

Hi Per,

You may be seeing the stale-Ivy-lock problem (see IVY-1388). LUCENE-4636 
upgraded the bootstrapped Ivy to 2.3.0 to reduce the likelihood of this 
problem, so the first thing is to make sure you have that version in 
~/.ant/lib/ - if not, remove the Ivy jar that’s there and run ‘ant 
ivy-bootstrap’ to download and put the 2.3.0 jar in place.

You should run the following and remove any files it finds:

find ~/.ivy2/cache -name ‘*.lck’

That should stop ‘ant resolve’ from hanging.

Steve 

On Jan 30, 2014, at 5:06 AM, Per Steffensen  wrote:

> Hi
> 
> Earlier in used to be able to successfully run "ant eclipse" from branch_4x. 
> With the newest code (tip of branch_4x today) I cant. "ant eclipse" hangs 
> forever at the point showed by console output below. I noticed that this 
> problem has been around for a while - not something that happened today. Any 
> idea about what might be wrong? A solution? Help to debug?
> 
> Regards Per Steffensen
> 
> --- console when running "ant eclipse" -
> 
> ...
> 
> resolve:
> [echo] Building solr-example-DIH...
> 
> ivy-availability-check:
> [echo] Building solr-example-DIH...
> 
> ivy-fail:
> 
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml
> 
> resolve:
> 
> resolve:
> [echo] Building solr-core...
> 
> ivy-availability-check:
> [echo] Building solr-core...
> 
> ivy-fail:
> 
> ivy-fail:
> 
> ivy-configure:
> [ivy:configure] :: loading settings :: file = 
> /Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml
> 
> resolve:
> 
> HERE IT JUST HANGS FOREVER
> -

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Jack Krupansky

The standard, keyword-oriented query parsers will all treat unquoted, 
unescaped white space as term delimiters and ignore the what space. There is 
no way to bypass that behavior. So, your regex will never even see the white 
space - unless you enclose the text and white space in quotes or use a 
backslash to quote each white space character.


You can use the "field" and "term" query parsers to pass a query string as 
if it were fully enclosed in quotes, but that only handles a single term and 
does not allow for multiple terms or any query operators. For example:


{!field f=myfield}Foo Bar

See:
http://wiki.apache.org/solr/QueryParser

You can also pre-configure the field query parser with the defType=field 
parameter.


-- Jack Krupansky


-Original Message- 
From: Srinivasa7

Sent: Thursday, January 30, 2014 6:37 AM
To: solr-user@lucene.apache.org
Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches

Hi,

I  have similar kind of problem  where I want search for a words with spaces
in that. And I wanted to search by stripping all the spaces .

I have used following schema for that


   
 
   
   
   
   

   
   
   
   
   


And



   



But it is not searching the right terms . we are stripping the spaces and
indexing lowercase values when we do that.


Like : East Enders

when I seach for   'east end ers'  text, its not returning any values saying
no document found.

I realised the solr uses QueryParser before passing query string to the
QueryAnalyzer in defined in schema.

And The Query parser is tokenizing the query string providing in query . So
it is sending each token to the QueryAnalyser that is defined in schema.


SO is there anyway that I can by pass this query parser or use a correct
query processor which can consider the entire string as single pharse.

At the moment I am using dismax query processor.

Any suggestion would be much appreciated.

Thanks
Srinivasa



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Srinivasa7

Aleksander Akerø 
It would be great if you can share the solution how you are handling it on
field basis



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114435.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Srinivasa7

Hi, 

I  have similar kind of problem  where I want search for a words with spaces
in that. And I wanted to search by stripping all the spaces . 

I have used following schema for that 



  









 


And 







But it is not searching the right terms . we are stripping the spaces and
indexing lowercase values when we do that. 


Like : East Enders 

when I seach for   'east end ers'  text, its not returning any values saying
no document found.

I realised the solr uses QueryParser before passing query string to the
QueryAnalyzer in defined in schema. 

And The Query parser is tokenizing the query string providing in query . So
it is sending each token to the QueryAnalyser that is defined in schema. 


SO is there anyway that I can by pass this query parser or use a correct
query processor which can consider the entire string as single pharse. 

At the moment I am using dismax query processor.

Any suggestion would be much appreciated.

Thanks 
Srinivasa



--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Use a field without predefining it it the schema

2014-01-30 Thread Hakim Benoudjit

Thanks, That's a good feature since I dont have to reindex the whole data,
nor to restart solr app.


2014-01-30 Steve Rowe 

> Hakim,
>
> All the fields you have added manually to the schema will be kept when you
> switch to using managed schema.
>
> From the managed schema page on the Solr Reference Guide you linked to
> (describing what happens after you add  class="ManagedIndexSchemaFactory">... to your solrconfig.xml,
> and then restart Solr in order for the change to take effect):
>
> Once Solr is restarted, the existing schema.xml file is renamed to
> schema.xml.bak and the contents are written to a file with the name
> defined as the managedSchemaResourceName.
>
> Steve
>
> On Jan 29, 2014, at 7:15 PM, Hakim Benoudjit 
> wrote:
>
> > I have found this link
> >
> https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig
> > .
> > I dont know if it's required to modify the schema (see the link), to make
> > it editable by the REST API. I wish that it doesnt clear all the fields
> > that I have added manually to the schema.
> >
> >
> > 2014-01-30 Hakim Benoudjit 
> >
> >> Thanks Steve for the link.
> >> It seems very easy to create `new fields` in the `schema` using the
> `POST
> >> request`. But doest mean that I dont have to restart the `solr app`? Is
> so,
> >> is this feature available in latest solr version (`v4.6`)?
> >>
> >>
> >> 2014-01-29 Alexandre Rafalovitch 
> >>
> >> There is an example in the distribution that shows how new fields are
> >>> auto-defined. I think it is example-schemaless. The secret is in the
> >>> UpdateRequestProcessor chain that does cleanup and auto-mapping. Plus
> >>> - I guess - automatically generated schema.
> >>>
> >>> Just remember that once the field is added the first time, it now
> >>> exists. So careful not to send a date-looking thing into what should
> >>> be a text field.
> >>>
> >>> Regards,
> >>>   Alex.
> >>> Personal website: http://www.outerthoughts.com/
> >>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>> - Time is the quality of nature that keeps events from happening all
> >>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> >>> book)
> >>>
> >>>
> >>> On Wed, Jan 29, 2014 at 5:45 AM, Steve Rowe  wrote:
>  Hi Hakim,
> 
>  Check out the section of the Solr Reference Guide on modifying the
> >>> schema via REST API:
> 
> 
> >>>
> https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-Modifytheschema
> 
>  Steve
> 
>  On Jan 28, 2014, at 5:00 PM, Hakim Benoudjit 
> >>> wrote:
> 
> > Hi guys
> >
> > With the new version of solr (4.6), can I add a field to the index,
> >>> knowing
> > that this field doesnt appear(isnt predefined) in the schema?
> >
> > I ask this question because I ve seen an issue (on jira) related to
> >>> this.
> >
> > Thanks!
> 
> >>>
> >>
> >>
>
>

Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody

Hi All,

I triggered a /dataimport for first 100 rows from my database and while
it's running issued another import request for rows 101-200.

In my log I see below exception; It seems multiple JDBC connections cannot
be opened. Does this mean concurrency is not supported in DIH for JDBC
datasources?

Please share your thoughts on how to tackle concurrency in dataimport..

[Thread-15] ERROR org.apache.solr.handler.dataimport.JdbcDataSource  -
Ignoring Error when closing connection
java.sql.SQLException: Streaming result set
com.mysql.jdbc.RowDataDynamic@1e820764 is still active. No statements may
be issued when any streaming result sets are open and in use on a given
connection. Ensure that you have called .close() on any active streaming
result sets before attempting more queries.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at
com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:3314)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2477)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809)
at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(ConnectionImpl.java:5165)
at com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5048)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4654)
at com.mysql.jdbc.ConnectionImpl.close(ConnectionImpl.java:1630)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:410)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:395)
at
org.apache.solr.handler.dataimport.DocBuilder.closeEntityProcessorWrappers(DocBuilder.java:284)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)

Thanks,
Dileepa

On Thu, Jan 30, 2014 at 4:13 PM, Dileepa Jayakody  wrote:

> I would particularly like to know how DIH handles concurrency in JDBC
> database connections during datamport..
>
>  url="jdbc:mysql://localhost:3306/solrtest" user="usr1" password="123"
> batchSize="1" />
>
> Thanks,
> Dileepa
>
>
> On Thu, Jan 30, 2014 at 4:05 PM, Dileepa Jayakody <
> dileepajayak...@gmail.com> wrote:
>
>> Hi All,
>>
>> Can I please know about how concurrency is handled in the DIH?
>> What happens if multiple /dataimport requests are issued to the same
>> Datasource?
>>
>> I'm doing some custom processing at the end of dataimport process as an
>> EventListener configured in the data-config.xml as below.
>>  > onImportEnd="com.solr.stanbol.processor.StanbolEventListener">
>>
>> Will each DIH request create a new EventListener object?
>>
>> I'm copying some field values from my custom processor configured in the
>> /dataimport request handler to a static Map in my StanbolEventListener
>> class.
>> I need to figure out how to handle concurrency when data is copied to my
>> EvenetListener object to perform the rest of my update process.
>>
>> Thanks,
>> Dileepa
>>
>
>

Re: Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody

I would particularly like to know how DIH handles concurrency in JDBC
database connections during datamport..



Thanks,
Dileepa


On Thu, Jan 30, 2014 at 4:05 PM, Dileepa Jayakody  wrote:

> Hi All,
>
> Can I please know about how concurrency is handled in the DIH?
> What happens if multiple /dataimport requests are issued to the same
> Datasource?
>
> I'm doing some custom processing at the end of dataimport process as an
> EventListener configured in the data-config.xml as below.
>   onImportEnd="com.solr.stanbol.processor.StanbolEventListener">
>
> Will each DIH request create a new EventListener object?
>
> I'm copying some field values from my custom processor configured in the
> /dataimport request handler to a static Map in my StanbolEventListener
> class.
> I need to figure out how to handle concurrency when data is copied to my
> EvenetListener object to perform the rest of my update process.
>
> Thanks,
> Dileepa
>

Concurrency handling in DataImportHandler

2014-01-30 Thread Dileepa Jayakody

Hi All,

Can I please know about how concurrency is handled in the DIH?
What happens if multiple /dataimport requests are issued to the same
Datasource?

I'm doing some custom processing at the end of dataimport process as an
EventListener configured in the data-config.xml as below.
 

Will each DIH request create a new EventListener object?

I'm copying some field values from my custom processor configured in the
/dataimport request handler to a static Map in my StanbolEventListener
class.
I need to figure out how to handle concurrency when data is copied to my
EvenetListener object to perform the rest of my update process.

Thanks,
Dileepa

Re: Lucene Join

2014-01-30 Thread Michael McCandless

Look in lucene's join module?

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 30, 2014 at 4:15 AM, anand chandak  wrote:
> Hi,
>
>
> I am trying to find whether the lucene joins (not solr join) if they are
> using any filter cache. The API that lucene uses is for joining
> joinutil.createjoinquery(), where can I find the source code for this API.
>
>
> Thanks in advance
>
> Thanks,
>
> Anand
>

ant eclipse hangs - branch_4x

2014-01-30 Thread Per Steffensen


Hi

Earlier in used to be able to successfully run "ant eclipse" from 
branch_4x. With the newest code (tip of branch_4x today) I cant. "ant 
eclipse" hangs forever at the point showed by console output below. I 
noticed that this problem has been around for a while - not something 
that happened today. Any idea about what might be wrong? A solution? 
Help to debug?


Regards Per Steffensen

--- console when running "ant eclipse" -

...

resolve:
 [echo] Building solr-example-DIH...

ivy-availability-check:
 [echo] Building solr-example-DIH...

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml


resolve:

resolve:
 [echo] Building solr-core...

ivy-availability-check:
 [echo] Building solr-core...

ivy-fail:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/Some/Path/ws_kepler_apache_lucene_solr_4x/branch_4x/lucene/ivy-settings.xml


resolve:

HERE IT JUST HANGS FOREVER
-

Lucene Join

2014-01-30 Thread anand chandak


Hi,


I am trying to find whether the lucene joins (not solr join) if they are 
using any filter cache. The API that lucene uses is for joining 
joinutil.createjoinquery(), where can I find the source code for this API.



Thanks in advance

Thanks,

Anand

Re: KeywordTokenizerFactory - trouble with "exact" matches

2014-01-30 Thread Aleksander Akerø

Tried the following config for setting the autoGeneratePhraseQueries but it
didn't seem to change anything. Tested both "true" and "false".

 <
tokenizer class="solr.KeywordTokenizerFactory"/>  

Still I do not get any matches when searching for "FE 009" without quotes.

Set debugQuery to "on" and this is what it shows. Definitely looks like it
does this MultiPhraseQuery thing.

FE 009
FE 009

(+(DisjunctionMaxQuery((number:FE))
DisjunctionMaxQuery((number:009/no_coord

+((number:FE) (number:009))

ExtendedDismaxQParser

I also looked into these query-parsers, but as it may look like the
splitting on whitespace is something that is done by the dismax queryparser
before the terms are passed to any analyzers. And it is vital to me that I
can differentiate this on a per field basis.

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-29 Aleksander Akerø 

> Thanks a lot, I'll try the autoGeneratePhraseQueries property and see how
> that works.
>
> Regarding the reindexing tip, it's a good tip but due to the my current
> "on the fly" setup on the servers at work i basically have do build a
> project with maven and deploy to tomcat, wherein the index lies, and I
> therefore have to reindex each time otherwise the index would be empty.
> Also i usually add use the "clean" parameter when testing with DIH. So that
> shouldn't be a problem.
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: aleksan...@gurusoft.no
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>
>
> 2014-01-29 Alexandre Rafalovitch 
>
> I think the whitespace might also be the issue. The query gets parsed
>> by standard component that splits it on space before passing
>> individual components into the field searches.
>>
>> Try enabling autoGeneratePhraseQueries on the field (or field type)
>> and reindexing. See if that makes a difference.
>>
>> Regards,
>>   Alex.
>> Personal website: http://www.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Wed, Jan 29, 2014 at 9:55 PM, Aleksander Akerø
>>  wrote:
>> > update:
>> >
>> > Guessing that this has nothing to do with the tokenizer. Tried to use
>> the
>> > string fieldtype as well, but still the same results. So this must have
>> to
>> > do with some other solr config.
>> >
>> > What confuses me is that when I search "1005" which is another valid
>> value
>> > to search for, it works perfectly, but then again, this query contains
>> no
>> > whitespace.
>> >
>> > Any ideas?
>> >
>> > *Aleksander Akerø*
>> > Systemkonsulent
>> > Mobil: 944 89 054
>> > E-post: aleksan...@gurusoft.no
>> >
>> > *Gurusoft AS*
>> > Telefon: 92 44 09 99
>> > Østre Kullerød
>> > www.gurusoft.no
>> >
>> >
>> > 2014-01-29 Aleksander Akerø 
>> >
>> >> Thanks for the quick answer, but it doesn't help if I remove the
>> lowercase
>> >> analyzer like so:
>> >>
>> >> *> >> positionIncrementGap="100">*
>> >> **
>> >> **
>> >> **
>> >> **
>> >> **
>> >> **
>> >> **
>> >>
>> >>  I still need to add quotes to the searchquery to get results. And the
>> >> weird thing is that if I use the analyzer and put in "FE 009" (again,
>> >> without quotes) for both index and query values, it highlights the
>> result
>> >> as to show a match, but when i search using the GUI it gives me no
>> results.
>> >> The same happens when posting directly to the /select requestHandler
>> via GET
>> >>
>> >> These is what i post using GET:
>> >> http://mysite.com/solr/corename/select?q=number:FE%20009&qf=number
>>  =>
>> >> this does not work
>> >> http://mysite.com/solr/corename/select?q=number:"FE%20009"&qf=number
>>  =>
>> >> this works
>> >>
>> >> Really starting to wonder if I am doing something terribly wrong
>> somewhere.
>> >>
>> >> This is my requestHandler btw, pretty basic:
>> >> 
>> >> 
>> >> 
>> >> explicit
>> >> edismax
>> >> *:*
>> >> 10
>> >> *,score
>> >> number
>> >> 
>> >> 
>> >>
>> >> *Aleksander Akerø*
>> >> Systemkonsulent
>> >> Mobil: 944 89 054
>> >> E-post: aleksan...@gurusoft.no
>> >>
>> >> *Gurusoft AS*
>> >> Telefon: 92 44 09 99
>> >> Østre Kullerød
>> >> www.gurusoft.no
>> >>
>> >>
>> >> 2014-01-29 Aruna Kumar Pamulapati 
>> >>
>> >> Hi ,
>> >>>
>> >>> I think the misunderstanding you are having is about
>> >>>
>> >>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
>> >>> lowercase
>> >>> factory.
>> >>>
>> >>> You are correct about KeywordTokenizerFactory  but lowercase factory :
>> >>> Creates
>> >>> tokens by lowercasing all letters and dr

45 matches

Mail list logo