Re: facet.field counts when q includes field

2014-04-27 Thread Trey Grainger
No problem, Mike. Glad you got it sorted out.

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search & Analytics @ CareerBuilder


On Sun, Apr 27, 2014 at 7:23 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> On 4/27/14 7:02 PM, Michael Sokolov wrote:
>
>> On 4/27/2014 6:30 PM, Trey Grainger wrote:
>>
>>> So my question basically is: which restrictions are applied to the docset
>
 from which (field) facets are computed?
>>>
>>> Facets are generated based upon values found within the documents
>>> matching
>>> your "q=" parameter and also all of your "fq=" parameters. Basically, if
>>> you do an intersection of the docsets from all "q=" and "fq=" parameters
>>> then you end up with the docset the facet calculations are based upon.
>>>
>>> When you say "if I add type=book, *no* documents match, but I get facet
>>> counts: { chapter=4 }", I'm not exactly sure what you mean. If you are
>>> adding "q=toto&type=book&facet=true&facet.field=type" then the problem
>>> is
>>> that the "type=book" parameter doesn't do anything... it is not a valid
>>> Solr parameter for filtering here. In this case, all 4 of your documents
>>> matching the "q=toto" query are still being returned, which is why the
>>> facet count for chapters is 4.
>>>
>> In fact my query looks like:
>>
>> q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+%
>> 2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp%
>> 2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s
>>
>> or without url encoding:
>>
>>  q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false)
>> facet.field=dc_type_s
>>
>> default operator is AND
>>
>>  ... so I don't think that the query is broken like you described?
>>
>> -Mike
>>
> OK the problem wasn't with the query, but while I tried to write out a
> clearer explanation, I found it -- an issue in a unit test too boring to
> describe.  Facets do seem to work like you said, and how they're
> documented, and as I assumed they did :)
>
> Thanks, and sorry for the noise.
>
> -Mike
>


Re: Application of different stemmers / stopword lists within a single field

2014-04-27 Thread Alexandre Rafalovitch
If you can throw money at the problem:
http://www.basistech.com/text-analytics/rosette/language-identifier/ .
Language Boundary Locator at the bottom of the page seems to be
part/all of your solution.

Otherwise, specifically for English and Arabic, you could play with
Unicode ranges to try detecting text blocks:
1) Create an UpdateRequestProcessor chain that
a) clones text into field_EN and field_AR.
b) applies regular expression transformations that strip English or
Arabic unicode text range correspondingly, so field_EN only has
English characters left, etc. Of course, you need to decide what you
want to do with occasional EN or neutral characters happening in the
middle of Arabic text (numbers: Arabic or Indic? brackets, dashes,
etc). But if you just index text, it might be ok even if it is not
perfect.
c) deletes empty fields, just in case not all of them have mix language
2) Use eDismax to search over both fields, each with its own processor.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Fri, Apr 25, 2014 at 5:34 PM, Timothy Hill  wrote:
> This may not be a practically solvable problem, but the company I work for
> has a large number of lengthy mixed-language documents - for example,
> scholarly articles about Islam written in English but containing lengthy
> passages of Arabic. Ideally, we would like users to be able to search both
> the English and Arabic portions of the text, using the full complement of
> language-processing tools such as stemming and stopword removal.
>
> The problem, of course, is that these two languages co-occur in the same
> field. Is there any way to apply different processing to different words or
> paragraphs within a single field through language detection? Is this to all
> intents and purposes impossible within Solr? Or is another approach (using
> language detection to split the single large field into
> language-differentiated smaller fields, for example) possible/recommended?
>
> Thanks,
>
> Tim Hill


Re: How to sort solr results by foreign id field

2014-04-27 Thread Goosef_Le_Hung
So, way problem above ?



-
Lady Cute
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov

On 4/27/14 7:02 PM, Michael Sokolov wrote:

On 4/27/2014 6:30 PM, Trey Grainger wrote:
So my question basically is: which restrictions are applied to the 
docset

from which (field) facets are computed?

Facets are generated based upon values found within the documents 
matching

your "q=" parameter and also all of your "fq=" parameters. Basically, if
you do an intersection of the docsets from all "q=" and "fq=" parameters
then you end up with the docset the facet calculations are based upon.

When you say "if I add type=book, *no* documents match, but I get facet
counts: { chapter=4 }", I'm not exactly sure what you mean. If you are
adding "q=toto&type=book&facet=true&facet.field=type" then the 
problem is

that the "type=book" parameter doesn't do anything... it is not a valid
Solr parameter for filtering here. In this case, all 4 of your documents
matching the "q=toto" query are still being returned, which is why the
facet count for chapters is 4.

In fact my query looks like:

q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+%2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp%2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s 



or without url encoding:

 q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false)
facet.field=dc_type_s

default operator is AND

 ... so I don't think that the query is broken like you described?

-Mike
OK the problem wasn't with the query, but while I tried to write out a 
clearer explanation, I found it -- an issue in a unit test too boring to 
describe.  Facets do seem to work like you said, and how they're 
documented, and as I assumed they did :)


Thanks, and sorry for the noise.

-Mike


Re: facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov

On 4/27/2014 6:30 PM, Trey Grainger wrote:

So my question basically is: which restrictions are applied to the docset

from which (field) facets are computed?

Facets are generated based upon values found within the documents matching
your "q=" parameter and also all of your "fq=" parameters. Basically, if
you do an intersection of the docsets from all "q=" and "fq=" parameters
then you end up with the docset the facet calculations are based upon.

When you say "if I add type=book, *no* documents match, but I get facet
counts: { chapter=4 }", I'm not exactly sure what you mean. If you are
adding "q=toto&type=book&facet=true&facet.field=type" then the problem is
that the "type=book" parameter doesn't do anything... it is not a valid
Solr parameter for filtering here. In this case, all 4 of your documents
matching the "q=toto" query are still being returned, which is why the
facet count for chapters is 4.

In fact my query looks like:

q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+%2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp%2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s

or without url encoding:

 q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false)
facet.field=dc_type_s

default operator is AND

 ... so I don't think that the query is broken like you described?

-Mike


Re: facet.field counts when q includes field

2014-04-27 Thread Trey Grainger
>>So my question basically is: which restrictions are applied to the docset
from which (field) facets are computed?

Facets are generated based upon values found within the documents matching
your "q=" parameter and also all of your "fq=" parameters. Basically, if
you do an intersection of the docsets from all "q=" and "fq=" parameters
then you end up with the docset the facet calculations are based upon.

When you say "if I add type=book, *no* documents match, but I get facet
counts: { chapter=4 }", I'm not exactly sure what you mean. If you are
adding "q=toto&type=book&facet=true&facet.field=type" then the problem is
that the "type=book" parameter doesn't do anything... it is not a valid
Solr parameter for filtering here. In this case, all 4 of your documents
matching the "q=toto" query are still being returned, which is why the
facet count for chapters is 4.

If instead you specify "q=toto&fq=type:book&facet=true&facet.field=type"
then this will filter down to ONLY the documents with a type of book. Since
it looks like in your data there are no documents which are both a type of
book and also match the "q=toto" query, you should get 0 documents and thus
the counts of all your facet values will be zero.

As you mentioned, it is possible to utilize tags and excludes to change the
behavior described above, but hopefully this answers your question about
the default behavior.

Thanks,

Trey Grainger
Co-author, Solr in Action
Director of Engineering, Search & Analytics @ CareerBuilder


On Sun, Apr 27, 2014 at 4:51 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> I'm trying to understand the facet counts I'm getting back from Solr when
> the main query includes a term that restricts on a field that is being
> faceted.  After reading the docs on the wiki (both wikis) I'm confused.
>
> In my little test dataset, if I facet on "type" and use q=*:*, I get facet
> counts for type: [ chapter=5, book=1 ]
>
> With q=toto, only four of the chapters match, so I get facet counts for
> type: { chapter=4 } .
>
> Now if I add type=book, *no* documents match, but I get facet counts: {
> chapter=4 }.
>
> It's as if the type term from the query is being ignored when the facets
> are computed.  This is actually what we want, in general, but the
> documentation doesn't reflect it and I'd like to understand better the
> mechanism so I can tell what I can rely on.
>
> I see that there is the possibility of tagging and excluding filters (fq)
> so they don't effect the facet counting, but there's no mention on the wiki
> of any sort of term exclusion from the main query.  I poked around in the
> source a bit, but wasn't able to find an answer quickly, so I thought I'd
> ask here.
>
> So my question basically is: which restrictions are applied to the docset
> from which (field) facets are computed?
>
> -Mike
>
>
>


facet.field counts when q includes field

2014-04-27 Thread Michael Sokolov
I'm trying to understand the facet counts I'm getting back from Solr 
when the main query includes a term that restricts on a field that is 
being faceted.  After reading the docs on the wiki (both wikis) I'm 
confused.


In my little test dataset, if I facet on "type" and use q=*:*, I get 
facet counts for type: [ chapter=5, book=1 ]


With q=toto, only four of the chapters match, so I get facet counts for 
type: { chapter=4 } .


Now if I add type=book, *no* documents match, but I get facet counts: { 
chapter=4 }.


It's as if the type term from the query is being ignored when the facets 
are computed.  This is actually what we want, in general, but the 
documentation doesn't reflect it and I'd like to understand better the 
mechanism so I can tell what I can rely on.


I see that there is the possibility of tagging and excluding filters 
(fq) so they don't effect the facet counting, but there's no mention on 
the wiki of any sort of term exclusion from the main query.  I poked 
around in the source a bit, but wasn't able to find an answer quickly, 
so I thought I'd ask here.


So my question basically is: which restrictions are applied to the 
docset from which (field) facets are computed?


-Mike




Wildcard search not working with search term having special characters and digits

2014-04-27 Thread Geepalem
Hi,

Below query without wildcard search is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-138"; 

But below query with wildcard is not returning results
http://localhost:8080/solr/master/select?q=page_title_t:"an-13*";

Below query with wildcard search and no didgits  is returning results.
http://localhost:8080/solr/master/select?q=page_title_t:"an-*"; 

I have tried by adding WordDelimeter Filter but there is no luck.



Please suggest or guide how to make wildcard search works with special
characters and digits.

Appreciate immediate response!!

Thanks,
G. Naresh Kumar


 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stemming not working with wildcard search

2014-04-27 Thread Geepalem
Hi,

I have added  SnowballPorterFilterFactory filter to field type to make
singular and plural search terms return same results.

So below queries (double quotes around search term) returning similar
results which is fine.

http://localhost:8080/solr/master/select?q=page_title_t:"product*";
http://localhost:8080/solr/master/select?q=page_title_t:"products*";

But when I have analyzed results, in both result sets, documents which dont
start with words "Product" or "products" didnt come though there are few
documents available.

So I have added * as prefix and suffix to search term without double quotes
to do wildcard search.

http://localhost:8080/solr/master/select?q=page_title_t:*product*
http://localhost:8080/solr/master/select?q=page_title_t:*products*

Now, stemming is not working as above second query is not returning similar
results as query 1.

If double quotes are added around search term then its returning similar
results but results are not as expected. With double quotes it wont return
results like "Old products", "New products", "Cool Product".
It will only return results with the values like "Product 1", "Product
2","Products of USA".

Please suggest or guide how to make stemming work with wildcard search.


Appreciate immediate response!!

Thanks,
G. Naresh Kumar





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Data Import Handelr Question

2014-04-27 Thread Erick Erickson
This might be helpful: http://searchhub.org/2012/02/14/indexing-with-solrj/

It combines using Tika for structured documents and using a JDBC
connector, but extracting the DB-specific stuff should be quite easy.

Best,
Erick

On Sun, Apr 27, 2014 at 7:24 AM, Yuval Dotan  wrote:
> Thanks Shawn
>
> In your opinion, what do you think is easier, writing the importer from
> scratch or extending the DIH (for example: adding the state etc...)?
>
>
> Yuval
>
>
> On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey  wrote:
>
>> On 4/24/2014 9:24 AM, Yuval Dotan wrote:
>>
>>> I want to use the DIH component in order to import data from old
>>> postgresql
>>> DB.
>>> I want to be able to recover from errors and crashes.
>>> If an error occurs I should be able to restart and continue indexing from
>>> where it stopped.
>>> Is the DIH good enough for my requirements ?
>>> If not is it possible to extend one of its classes in order to support the
>>> recovery?
>>>
>>
>> The entity in the Dataimport Handler (DIH) config has an "onError"
>> attribute.
>>
>> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config
>> https://cwiki.apache.org/confluence/display/solr/
>> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#
>> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors
>>
>> But honestly, if you want a really robust Java program that indexes to
>> Solr and does precisely what you want, you may be better off writing it
>> yourself using SolrJ and JDBC.  DIH is powerful and efficient, but when you
>> write the program yourself, you can do anything you want with your data.
>>
>> You also have the possibility of resuming an import after a Solr crash.
>>  Because DIH is embedded in Solr and doesn't save any kind of state data
>> about an import in progress, that's pretty much impossible with DIH.  With
>> a SolrJ program, you'd have to handle that yourself, but it would be
>> *possible*.
>>
>> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
>>
>> Thanks,
>> Shawn
>>
>>


Re: How to sort solr results by foreign id field

2014-04-27 Thread Erick Erickson
Store the sort criteria in the documents you want to sort. Solr is
_not_ a RDBMS, trying to do SQL-like things is usually a mistake, the
usual approach is to de-normalize your data so you don't need to try.

Best
Erick

On Sun, Apr 27, 2014 at 6:11 AM, Goosef_Le_Hung  wrote:
> help me
>
>
>
> -
> Lady Cute
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133345.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: DocValues and StatsComponent

2014-04-27 Thread Ahmet Arslan
Hi Harish,

I created https://issues.apache.org/jira/browse/SOLR-6024 on behalf of you.

Ahmet



On Friday, April 4, 2014 3:13 AM, Ahmet Arslan  wrote:
Hi Harish,

I re-produced your problem with example/default setup.

I enabled doc values example fields. ( deleted the original ones) and indexed 
example documents.

 
 
 

Single valued fields work fine. But stats on multi-valued field cat yields 

http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true&stats=true&stats.field=cat


"msg": "Type mismatch: cat was indexed as SORTED_SET", "code": 400

And confluence does not say anything about this.

Can you file a jira issue?

Ahmet


On Thursday, April 3, 2014 11:01 PM, Harish Agarwal  
wrote:
Is there a known issue using the StatsComponent against fields indexed with
docvalues?  My setup is currently throwing this error (against the latest
nightly build):

org.apache.solr.common.Solr*Exception*; org.apache.solr.common.Solr
*Exception*: Type mismatch: INTEGER_4 was indexed as SORTED_SET



Re: Data Import Handelr Question

2014-04-27 Thread Yuval Dotan
Thanks Shawn

In your opinion, what do you think is easier, writing the importer from
scratch or extending the DIH (for example: adding the state etc...)?


Yuval


On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey  wrote:

> On 4/24/2014 9:24 AM, Yuval Dotan wrote:
>
>> I want to use the DIH component in order to import data from old
>> postgresql
>> DB.
>> I want to be able to recover from errors and crashes.
>> If an error occurs I should be able to restart and continue indexing from
>> where it stopped.
>> Is the DIH good enough for my requirements ?
>> If not is it possible to extend one of its classes in order to support the
>> recovery?
>>
>
> The entity in the Dataimport Handler (DIH) config has an "onError"
> attribute.
>
> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config
> https://cwiki.apache.org/confluence/display/solr/
> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#
> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors
>
> But honestly, if you want a really robust Java program that indexes to
> Solr and does precisely what you want, you may be better off writing it
> yourself using SolrJ and JDBC.  DIH is powerful and efficient, but when you
> write the program yourself, you can do anything you want with your data.
>
> You also have the possibility of resuming an import after a Solr crash.
>  Because DIH is embedded in Solr and doesn't save any kind of state data
> about an import in progress, that's pretty much impossible with DIH.  With
> a SolrJ program, you'd have to handle that yourself, but it would be
> *possible*.
>
> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ
>
> Thanks,
> Shawn
>
>


Re: '0' Status: Communication Error

2014-04-27 Thread Karunakar Reddy
Hey Naresh,

few things that may be wrong
1) your application is not pointed to correct solr(change config.ini)
2)not able to access new solr machine from your application
environment(just run this command in terminal to know the status of the
port/IP from application environment  telnet IP_ADDRESS 8983).
Hope this helps!


On Sat, Apr 26, 2014 at 5:33 PM, Naresh  wrote:

> I've got this problem that I can't solve. Partly because I can't explain it
> with the right terms. I'm new to this so sorry for this clumsy question.
>
> Below you can see an overview of my goal.
>
> I'm using Magento CE1.7.0.2 & Solr 4.6.0.
>
> I'm using Magentix/Solr extension in Magento CE1.7.0.2 its working fine i
> can able get the response in max of 2secs. (Here i place Solr Server in
> near
> to My Magento)
>
> But i placed my Solr in separate server i don't want to place all these
> thing in one server.
>
>  Enable Search  : Yes
>  Enable Index   : Yes
>  Host   : IP address of Solr file existing server
>  Port   : 8983
>  Path   : /solr
>  Search limit   : 100
>
> But in solr logs its not giving any log details but actually that should
> give some log details & time taken for re-indexing dataetc
>
> And in Solr.log file its giving ERR (3): '0' Status: Communication Error..
>
> Any thing wrong i did here ?
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/0-Status-Communication-Error-tp4133265.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to sort solr results by foreign id field

2014-04-27 Thread Goosef_Le_Hung
help me



-
Lady Cute
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How can I convert xml message for updating a Solr index to a javabin file

2014-04-27 Thread Jack Krupansky

Look at the SolrJ source code and doc.

JavaBin is more of a protocol than a file format.

-- Jack Krupansky

-Original Message- 
From: Elran Dvir

Sent: Sunday, April 27, 2014 2:16 AM
To: solr-user@lucene.apache.org
Subject: RE: How can I convert xml message for updating a Solr index to a 
javabin file


Does anyone know a way to do this?

Thanks.

-Original Message-
From: Elran Dvir
Sent: Thursday, April 24, 2014 4:11 PM
To: solr-user@lucene.apache.org
Subject: RE: How can I convert xml message for updating a Solr index to a 
javabin file


I want to measure xml vs javabin update message indexing performance.

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk]
Sent: Thursday, April 24, 2014 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: How can I convert xml message for updating a Solr index to a 
javabin file


Why would you want to do this? Javabin is used by SolrJ to communicate with 
Solr. XML is good enough for communicating from the command line/curl, as is 
JSON. Attempting to use javabin just seems to add an unnecessary 
complication.


Upayavifra

On Thu, Apr 24, 2014, at 10:20 AM, Elran Dvir wrote:

Hi all,
Is there a way I can covert a xml Solr update message file to javabin
file? If so, How?
How can I use curl to update Solr by javabin message file?

Thank you very much.


Email secured by Check Point