EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Daniel Valdivia
Hi,

I'm trying to get the EdgeNGramFilterFactory filter to work on a certain field, 
however after defining the fieldType, creating a field for it and copying the 
source, this doesn't seem to be working.

One catch here, that I'm not sure if it's affecting the outcome is that none of 
my fields are stored, everything but the document id in my index is stored=false

I'm using Solr 5.3.1, and I know in my corpus the word "incident" is present, I 
can search for it, but looking for "inci" yields no results

http://localhost:8983/solr/superCore/select?q=inci=record_display_name=json=true

Any idea on what could I be doing wrong?

This is how I define the field type

{
  "add-field-type" : {
"indexed" : true,
"queryAnalyzer" : {
  "filters" : [
{
  "class" : "solr.LowerCaseFilterFactory"
}
  ],
  "tokenizer" : {
"class" : "solr.WhitespaceTokenizerFactory"
  }
},
"indexAnalyzer" : {
  "filters" : [
{
  "class" : "solr.LowerCaseFilterFactory"
},
{
  "class" : "solr.EdgeNGramFilterFactory",
  "minGramSize" : "2",
  "maxGramSize" : "10"
}
  ],
  "tokenizer" : {
"class" : "solr.WhitespaceTokenizerFactory"
  }
},
"stored" : false,
"name" : "prefix",
"class" : "solr.TextField"
  }
}

Adding the field

{
  "add-field":{
 "name":"dispNamePrefix",
 "type":"prefix",
 "stored":false }
}

Copy field

{
  "add-copy-field":{
 "source":"record_display_name",
 "dest":[ "dispNamePrefix"]}
}

Re: EdgeNGramFilterFactory not working? Solr 5.3.1

2015-11-17 Thread Daniel Valdivia
Hi Markus,

I did, everytime I run this experiment I start from 0 :)

However, after the last change I did seems like I forgot to commit and I 
couldn't get results, so now I have some results.

The resolution to this problem was specifying the search in the dispNamePrefix 
field :O

Thanks Markus and Alexandre

> On Nov 17, 2015, at 3:40 PM, Markus Jelsma  wrote:
> 
> Hi - the usual suspect is: 'did you reindex?' Not seeing things change after 
> modifying index-time analysis chains means you need to reindex.
> 
> M.
> 
> 
> 
> -Original message-
>> From:Daniel Valdivia 
>> Sent: Wednesday 18th November 2015 0:17
>> To: solr-user@lucene.apache.org
>> Subject: EdgeNGramFilterFactory not working? Solr 5.3.1
>> 
>> Hi,
>> 
>> I'm trying to get the EdgeNGramFilterFactory filter to work on a certain 
>> field, however after defining the fieldType, creating a field for it and 
>> copying the source, this doesn't seem to be working.
>> 
>> One catch here, that I'm not sure if it's affecting the outcome is that none 
>> of my fields are stored, everything but the document id in my index is 
>> stored=false
>> 
>> I'm using Solr 5.3.1, and I know in my corpus the word "incident" is 
>> present, I can search for it, but looking for "inci" yields no results
>> 
>> http://localhost:8983/solr/superCore/select?q=inci=record_display_name=json=true
>> 
>> Any idea on what could I be doing wrong?
>> 
>> This is how I define the field type
>> 
>> {
>>  "add-field-type" : {
>>"indexed" : true,
>>"queryAnalyzer" : {
>>  "filters" : [
>>{
>>  "class" : "solr.LowerCaseFilterFactory"
>>}
>>  ],
>>  "tokenizer" : {
>>"class" : "solr.WhitespaceTokenizerFactory"
>>  }
>>},
>>"indexAnalyzer" : {
>>  "filters" : [
>>{
>>  "class" : "solr.LowerCaseFilterFactory"
>>},
>>{
>>  "class" : "solr.EdgeNGramFilterFactory",
>>  "minGramSize" : "2",
>>  "maxGramSize" : "10"
>>}
>>  ],
>>  "tokenizer" : {
>>"class" : "solr.WhitespaceTokenizerFactory"
>>  }
>>},
>>"stored" : false,
>>"name" : "prefix",
>>"class" : "solr.TextField"
>>  }
>> }
>> 
>> Adding the field
>> 
>> {
>>  "add-field":{
>> "name":"dispNamePrefix",
>> "type":"prefix",
>> "stored":false }
>> }
>> 
>> Copy field
>> 
>> {
>>  "add-copy-field":{
>> "source":"record_display_name",
>> "dest":[ "dispNamePrefix"]}
>> }



Re: Kate Winslet vs Winslet Kate

2015-11-01 Thread Daniel Valdivia
Perhaps

q=name:("Kate AND Winslet")

q=name:("Kate Winslet")

Sent from my iPhone

> On Oct 31, 2015, at 10:21 PM, Yangrui Guo  wrote:
> 
> Thanks for the reply. Putting the name: before the terms did the work. I
> just wanted to generalize the search query because users might be
> interested in querying Kate Winslet herself or her movies. If user enter
> query string "Kate Winslet movie", the query q=name:(Kate AND Winslet AND
> movie) will return nothing.
> 
> Yangrui Guo
> 
> On Saturday, October 31, 2015, Erick Erickson 
> wrote:
> 
>> There are a couple of anomalies here.
>> 
>> 1> kate AND winslet
>> What does the query look like if you add =true to the statement
>> and look at the "parsed_query" section of the return?  My guess is you
>> typed "q=name:kate AND winslet" which parses as "q=name:kate AND
>> default_search_field:winslet" and are getting matches you don't
>> expect. You need something like "q=name:(kate AND winslet)" or
>> "q=name:kate AND name:winslet". Note that if you're using eDIsmax it's
>> more complicated, but that should still honor the intent.
>> 
>> 2> I have no idea why searching for "Kate Winslet" in quotes returns
>> anything, I wouldn't expect it to unless you mean you type in "q=kate
>> winslet" which is searching against your default field, not the name
>> field.
>> 
>> Best,
>> Erick
>> 
>> On Sat, Oct 31, 2015 at 8:52 PM, Yangrui Guo > > wrote:
>>> Hi today I found an interesting aspect of solr. I imported IMDB data into
>>> solr. The IMDB puts last name before first name for its person's name
>> field
>>> eg. "Winslet, Kate". When I search "Winslet Kate" with quotation marks I
>>> could get the exact result. However if I search "Kate Winslet" or Kate
>> AND
>>> Winslet solr seem to return me all result containing either Kate or
>> Winslet
>>> which is similar to "Winslet Kate"~99. From user perspective I
>>> certainly want solr to treat Kate Winslet the same as Winslet Kate. Is
>>> there anyway to make solr score higher for terms in the same field?
>>> 
>>> Yangrui
>> 


Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 
<https://wiki.apache.org/tika/EXIFToolParser>

> On Oct 29, 2015, at 1:48 PM, Daniel Valdivia <h...@danielvaldivia.com> wrote:
> 
> I think you can look into Tika for this https://tika.apache.org/ 
> <https://tika.apache.org/>
> 
> There’s handlers to integrate Tika and Solr, some context:
> 
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
>  
> <https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika>
> 
> 
> 
>> On Oct 29, 2015, at 1:47 PM, Rallavagu <rallav...@gmail.com 
>> <mailto:rallav...@gmail.com>> wrote:
>> 
>> In general, is there a built-in data handler to index pictures (essentially, 
>> EXIF and other data embedded in an image)? If not, what is the best practice 
>> to do so? Thanks.
> 



Re: Solr for Pictures

2015-10-29 Thread Daniel Valdivia
I think you can look into Tika for this https://tika.apache.org/ 


There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 




> On Oct 29, 2015, at 1:47 PM, Rallavagu  wrote:
> 
> In general, is there a built-in data handler to index pictures (essentially, 
> EXIF and other data embedded in an image)? If not, what is the best practice 
> to do so? Thanks.



Best strategy for indexing multiple tables with multiple fields

2015-10-26 Thread Daniel Valdivia
Hi, I’m new to the solr world, I’m in need of some experienced advice as I see 
I can do a lot of cool stuff with Solr, but I’m not sure which path to take so 
I don’t shoot myself in the foot with all this power :P

I have several tables (225) in my application, which I’d like to add into a 
single index (multiple type of documents in the same index with unique id) 
however, each table has a different number of columns, from 5 to 30 columns, do 
you recomend indexing each column separately or joining all columns into a 
single “big document”?

I’m trying to provide my users with a simple experience where they type their 
search query in a simple search box and I list all the possible documents 
across different tables that match their query, not sure if that strategy is 
the best, or perhaps a core per table?

So far these are my considered strategies:

unique_id , table , megafield: All of the columns in the record get mixed into 
a single megafield and indexes (cons: no faceting?)
a core per table: Each table gets a core, all the fields get indexed (except 
numbers and foreign keys), I’m not sure if having 200 cores will play nice with 
Solr
Single core, all fields get indexed ( possible 1,000’s of columns), this sounds 
expensive and not so efficient to me

My application has around 2M records

Thanks in advance for any advise.

Cheers