Re: Keepwords DataImportHandler

2014-12-18 Thread leostro
Hi all,

you are right, I was doing everything right but I wasn't using facets for
seeing the result.
I was mixing indexing and analysis.
Now I'm working on the next problem: having keepwords that consist of more
than one word... but this is another problem :)

thank you all, your hints were precious!
Leo




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699p4174941.html
Sent from the Solr - User mailing list archive at Nabble.com.


Keepwords DataImportHandler

2014-12-17 Thread leostro
Hi all,

This is my first question in this forum :D
I'm trying to import documents using a DataImportHandler.

document
  entity name=entry query=select top 100 id, title from entry order by
id desc   
  /entity  
/document

The first test is to import some document having only a title, I want to
import this field indexing it as a standard text type value.
Moreover I'd like to use a KeepwordsFilter for searching in these titles
fields some words I specified in a file, and I like to put the founded words
on a second fields, named tags, so I added this row in my configuration:

copyField source=title dest=brands /

For example: Assuming I've specified Nintendo in my keepwords file. If I'm
importing a document with title Nintendo NES I'd like to have two fields
in the resulting document imported:

title -- Nintendo NES
tags -- Nintendo

At the moment I have two fields with the same values: Nintendo NES
If I use the Analyzer section in SOLR  panel It seems that I made a good
configuration on my schema.xml

ST   nintendo NES
KWFnintendo 
LCF nintendo

So... I'd like to understand If I'm trying to use DataImportHandler in a
wrong way or if I need to change something for obtaining the behaviour I
explained above.

Hope someone can help me,
regards
leo

 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Keepwords DataImportHandler

2014-12-17 Thread Ahmet Arslan
Hi Leo,

You are doing OK. DIH and analysis are separate issues.

Please note that analysis changes indexed values. Like you see analysis section 
in solr panel.
When you retrieve stored values using fl= parameter, original values are 
displayed. 

So natural question is, how are you doing to consume tags field? If you are 
going to facet on them, what you have done is sufficient. E.g. You won't see 
NES in facets.

Ahmet



On Wednesday, December 17, 2014 11:33 AM, leostro leo.orland...@gmail.com 
wrote:
Hi all,

This is my first question in this forum :D
I'm trying to import documents using a DataImportHandler.

document
  entity name=entry query=select top 100 id, title from entry order by
id desc 
  /entity  
/document

The first test is to import some document having only a title, I want to
import this field indexing it as a standard text type value.
Moreover I'd like to use a KeepwordsFilter for searching in these titles
fields some words I specified in a file, and I like to put the founded words
on a second fields, named tags, so I added this row in my configuration:

copyField source=title dest=brands /

For example: Assuming I've specified Nintendo in my keepwords file. If I'm
importing a document with title Nintendo NES I'd like to have two fields
in the resulting document imported:

title -- Nintendo NES
tags -- Nintendo

At the moment I have two fields with the same values: Nintendo NES
If I use the Analyzer section in SOLR  panel It seems that I made a good
configuration on my schema.xml

ST   nintendo NES
KWFnintendo 
LCF nintendo

So... I'd like to understand If I'm trying to use DataImportHandler in a
wrong way or if I need to change something for obtaining the behaviour I
explained above.

Hope someone can help me,
regards
leo






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Keepwords DataImportHandler

2014-12-17 Thread Doug Turnbull
Leo, everything you describe sounds correct. Are you having any problems?
are keep words not working for DIH for you?

Or are you just looking for general pointers?

If so, your approach to this sounds a lot like a blog post I recently
wrote, which you might find useful:
http://opensourceconnections.com/blog/2014/12/08/title-search-when-relevancy-is-only-skin-deep/

You might also find this plugin from Ted Sullivan at LucidWorks handy in
this kind of work
http://lucidworks.com/blog/automatic-phrase-tokenization-improving-lucene-search-precision-by-more-precise-linguistic-analysis/

Cheers,
Doug

On Wednesday, December 17, 2014, leostro leo.orland...@gmail.com wrote:

 Hi all,

 This is my first question in this forum :D
 I'm trying to import documents using a DataImportHandler.

 document
   entity name=entry query=select top 100 id, title from entry order by
 id desc
   /entity
 /document

 The first test is to import some document having only a title, I want to
 import this field indexing it as a standard text type value.
 Moreover I'd like to use a KeepwordsFilter for searching in these titles
 fields some words I specified in a file, and I like to put the founded
 words
 on a second fields, named tags, so I added this row in my configuration:

 copyField source=title dest=brands /

 For example: Assuming I've specified Nintendo in my keepwords file. If
 I'm
 importing a document with title Nintendo NES I'd like to have two fields
 in the resulting document imported:

 title -- Nintendo NES
 tags -- Nintendo

 At the moment I have two fields with the same values: Nintendo NES
 If I use the Analyzer section in SOLR  panel It seems that I made a good
 configuration on my schema.xml

 ST   nintendo NES
 KWFnintendo
 LCF nintendo

 So... I'd like to understand If I'm trying to use DataImportHandler in a
 wrong way or if I need to change something for obtaining the behaviour I
 explained above.

 Hope someone can help me,
 regards
 leo






 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Keepwords-DataImportHandler-tp4174699.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Doug Turnbull
Search  Big Data Architect
OpenSource Connections http://o19s.com