Yes, I am applying "reuters" on my document (compose by text and picture). My goal is to do my research on the text of the document with any word or part of a word.
Yes the problem it's my nGram filter. How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an other but who satisfy my goal ? Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : > > Does it mean your applying the "reuters" analyzer on your base64 > encoded pictures? > > I guess it generates a really huge number of tokens for each entry > because of your nGram filter (with a max at 250). > > Cédric Hourcade > c...@wal.fr <javascript:> > > > On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard > <bernardt...@gmail.com <javascript:>> wrote: > > Information > > My "note_source" contain picture (.jpg, .png ...) in base64 and text. > > > > For my mapping I have used : > > "type" => "string" > > "analyzer" => "reuteurs" (the name of my analyzer) > > > > > > Any idea ? > > > > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : > >> > >> Hello > >> I have some issue, when I index a particular data "note_source" (sql > >> longtext). > >> I use the same analyzer for each fields (except date_source and > id_source) > >> but for "note_source", I have a "warn monitor.jvm". > >> When I remove "note_source", everything fine. If I don't use analyzer > on > >> "note_source", everything fine, but if I use my analyzer on > "note_source" I > >> have some crash. > >> > >> I think I have enough memory, I have used ES_HEAP_SIZE. > >> Maybe my problem it's with accent (ascii, utf-8) > >> > >> Can you help me with this ? > >> > >> > >> > >> My Setting > >> > >> public function createSetting($pf){ > >> $params = array('index' => $pf, 'body' => array( > >> 'settings' => array( > >> 'number_of_shards' => 5, > >> 'number_of_replicas' => 0, > >> 'analysis' => array( > >> 'filter' => array( > >> 'nGram' => array( > >> "token_chars" =>array(), > >> "type" => "nGram", > >> "min_gram" => 3, > >> "max_gram" => 250 > >> ) > >> ), > >> 'analyzer' => array( > >> 'reuters' => array( > >> 'type' => 'custom', > >> 'tokenizer' => 'standard', > >> 'filter' => array('lowercase', 'asciifolding', > >> 'nGram') > >> ) > >> ) > >> ) > >> ) > >> )); > >> $this->elasticsearchClient->indices()->create($params); > >> return; > >> } > >> > >> > >> My Indexing > >> > >> public function indexTable($pf,$typeElement){ > >> > >> $params =array( > >> "index" =>'_river', > >> "type" => $typeElement, > >> "id" => "_meta", > >> "body" =>array( > >> > >> "type" => "jdbc", > >> "jdbc" => array( > >> "url" => "jdbc:mysql://ip/name", > >> "user" => 'root', > >> "password" => 'mdp', > >> "index" => $pf, > >> "type" => $typeElement, > >> "sql" => select id_source as _id, id_sous_theme, > >> titre_source, desc_source, note_source, adresse_source, type_source, > >> date_source from source, > >> "max_bulk_requests" => 5, > >> ) > >> ) > >> > >> ); > >> > >> > >> $this->elasticsearchClient->index($params); > >> } > >> > >> Thanks in advance. > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "elasticsearch" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to elasticsearc...@googlegroups.com <javascript:>. > > To view this discussion on the web visit > > > https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com. > > > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.