Re: problem indexing with my analyzer

Tanguy Bernard Fri, 20 Jun 2014 02:26:31 -0700

Yes, I am applying "reuters" on my document (compose by text and picture).
My goal is to do my research on the text of the document with any word or 
part of a word.


Yes the problem it's my nGram filter.
How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an 
other but who satisfy my goal ?

Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit :
>
> Does it mean your applying the "reuters" analyzer on your base64 
> encoded pictures? 
>
> I guess it generates a really huge number of tokens for each entry 
> because of your nGram filter (with a max at 250). 
>
> Cédric Hourcade 
> c...@wal.fr <javascript:> 
>
>
> On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
> <bernardt...@gmail.com <javascript:>> wrote: 
> > Information 
> > My "note_source" contain picture (.jpg, .png ...) in base64 and text. 
> > 
> > For my mapping I have used : 
> > "type" => "string" 
> > "analyzer" => "reuteurs" (the name of my analyzer) 
> > 
> > 
> > Any idea ? 
> > 
> > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
> >> 
> >> Hello 
> >> I have some issue, when I index a particular data "note_source" (sql 
> >> longtext). 
> >> I use the same analyzer for each fields (except date_source and 
> id_source) 
> >> but for "note_source", I have a "warn monitor.jvm". 
> >> When I remove "note_source", everything fine. If I don't use analyzer 
> on 
> >> "note_source", everything fine, but if I use my analyzer on 
> "note_source" I 
> >> have some crash. 
> >> 
> >> I think I have enough memory, I have used ES_HEAP_SIZE. 
> >> Maybe my problem it's with accent (ascii, utf-8) 
> >> 
> >> Can you help me with this ? 
> >> 
> >> 
> >> 
> >> My Setting 
> >> 
> >>  public function createSetting($pf){ 
> >>         $params = array('index' => $pf, 'body' => array( 
> >>         'settings' => array( 
> >>             'number_of_shards' => 5, 
> >>             'number_of_replicas' => 0, 
> >>             'analysis' => array( 
> >>                 'filter' => array( 
> >>                     'nGram' => array( 
> >>                         "token_chars" =>array(), 
> >>                         "type" => "nGram", 
> >>                         "min_gram" => 3, 
> >>                         "max_gram"  => 250 
> >>                     ) 
> >>                 ), 
> >>                 'analyzer' => array( 
> >>                     'reuters' => array( 
> >>                         'type' => 'custom', 
> >>                         'tokenizer' => 'standard', 
> >>                         'filter' => array('lowercase', 'asciifolding', 
> >> 'nGram') 
> >>                     ) 
> >>                 ) 
> >>             ) 
> >>         ) 
> >>         )); 
> >>         $this->elasticsearchClient->indices()->create($params); 
> >>         return; 
> >> } 
> >> 
> >> 
> >> My Indexing 
> >> 
> >> public function indexTable($pf,$typeElement){ 
> >> 
> >>         $params =array( 
> >>             "index" =>'_river', 
> >>             "type" => $typeElement, 
> >>             "id" => "_meta", 
> >>             "body" =>array( 
> >> 
> >>                 "type" => "jdbc", 
> >>                 "jdbc" => array( 
> >>                     "url" => "jdbc:mysql://ip/name", 
> >>                     "user" => 'root', 
> >>                     "password" => 'mdp', 
> >>                     "index" => $pf, 
> >>                     "type" => $typeElement, 
> >>                     "sql" => select id_source as _id, id_sous_theme, 
> >> titre_source, desc_source, note_source, adresse_source, type_source, 
> >> date_source from source, 
> >>                     "max_bulk_requests" => 5, 
> >>                     ) 
> >>             ) 
> >> 
> >>         ); 
> >> 
> >> 
> >>         $this->elasticsearchClient->index($params); 
> >> } 
> >> 
> >> Thanks in advance. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com <javascript:>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Reply via email to