Re: I can't find anything after hypens or underscores

2015-01-30 Thread Alessandro Bonfanti

  
  
Il 28/01/2015 10:58, Alessandro
  Bonfanti ha scritto:


  
  Il 26/01/2015 16:37, Alessandro
Bonfanti ha scritto:
  
  

Il 21/01/2015 11:43, Alessandro
  Bonfanti ha scritto:


  
  Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
  
  

Il 12/11/2014 17:43, Alessandro
  Bonfanti ha scritto:


  
  Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
  
  

  
On Wed, Nov 12, 2014 at
  11:13 AM, Alessandro Bonfanti 
  wrote:
  

  Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
  
  

  

  
On Wed, Nov
  12, 2014 at 8:15 AM, Alessandro
  Bonfanti 
  wrote:
  
Hi, I'm very
  newbie on ElasticSearch. 
  I'm try to indexing a set of
  biological data. There are
  some fields like 'gene_id' or
  'gene_shortname' that should
  be processed as literal
  strings.
  When I try to search for
  'ZNF6092' in a field filled
  with 'linc-ZNF6092-6', I can't
  find anything. When I search
  for 'linc' I find correct
  document elsewhere.
  It seems that this is a
  problem with ES analyzer, but
  I tried to set it for do not
  analyze fields, but it seems
  that nothing changes.
  I try with:
  
  
  curl




-XPOST 'localhost:9200/a3' -d @tracking_map.json


  
  where tracking_map.json is
  
  
  {
    "mappings": {
      "tracking": {
        "properties": {
          "tracking_id"
: {
            "type": "string",
            "index":"not_analyzed"
          },
          "nearest_ref_id"
: {
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_id"
: {
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_short_name"
: {
            "type": "string",
            "index":"not_analyzed"
          }
        }
  

Re: I can't find anything after hypens or underscores

2015-01-28 Thread Alessandro Bonfanti

  
  
Il 26/01/2015 16:37, Alessandro
  Bonfanti ha scritto:


  
  Il 21/01/2015 11:43, Alessandro
Bonfanti ha scritto:
  
  

Il 02/12/2014 09:21, Alessandro
  Bonfanti ha scritto:


  
  Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
  
  

Il 12/11/2014 17:20, Nikolas
  Everett ha scritto:


  

  On Wed, Nov 12, 2014 at 11:13
AM, Alessandro Bonfanti 
wrote:

  
Il 12/11/2014 15:25, Nikolas Everett ha
  scritto:


  

  

  On Wed, Nov
12, 2014 at 8:15 AM, Alessandro
Bonfanti 
wrote:

  Hi, I'm very newbie
on ElasticSearch. 
I'm try to indexing a set of
biological data. There are some
fields like 'gene_id' or
'gene_shortname' that should be
processed as literal strings.
When I try to search for
'ZNF6092' in a field filled with
'linc-ZNF6092-6', I can't find
anything. When I search for
'linc' I find correct document
elsewhere.
It seems that this is a problem
with ES analyzer, but I tried to
set it for do not analyze
fields, but it seems that
nothing changes.
I try with:


curl



  -XPOST



  'localhost:9200/a3'
  -d
  @tracking_map.json
  
  

where tracking_map.json is


{
  "mappings": {
    "tracking": {
      "properties": {
        "tracking_id"
  :
  {
          "type":
  "string",
          "index":"not_analyzed"
        },
        "nearest_ref_id"
  :
  {
          "type":
  "string",
          "index":"not_analyzed"
        },
        "gene_id"
  :
  {
          "type":
  "string",
          "index":"not_analyzed"
        },
        "gene_short_name"
  :
  {
          "type":
  "string",
   

Re: I can't find anything after hypens or underscores

2015-01-26 Thread Alessandro Bonfanti

  
  
Il 21/01/2015 11:43, Alessandro
  Bonfanti ha scritto:


  
  Il 02/12/2014 09:21, Alessandro
Bonfanti ha scritto:
  
  

Il 12/11/2014 17:43, Alessandro
  Bonfanti ha scritto:


  
  Il 12/11/2014 17:20, Nikolas
Everett ha scritto:
  
  

  
On Wed, Nov 12, 2014 at 11:13
  AM, Alessandro Bonfanti 
  wrote:
  

  Il 12/11/2014 15:25, Nikolas Everett ha
scritto:
  
  

  

  
On Wed, Nov 12,
  2014 at 8:15 AM, Alessandro Bonfanti 
  wrote:
  
Hi, I'm very newbie
  on ElasticSearch. 
  I'm try to indexing a set of
  biological data. There are some
  fields like 'gene_id' or
  'gene_shortname' that should be
  processed as literal strings.
  When I try to search for 'ZNF6092'
  in a field filled with
  'linc-ZNF6092-6', I can't find
  anything. When I search for 'linc'
  I find correct document elsewhere.
  It seems that this is a problem
  with ES analyzer, but I tried to
  set it for do not analyze fields,
  but it seems that nothing changes.
  I try with:
  
  
  curl


-XPOST


'localhost:9200/a3'
-d
@tracking_map.json


  
  where tracking_map.json is
  
  
  {
    "mappings": {
      "tracking": {
        "properties": {
          "tracking_id"
:
{
            "type": "string",
            "index":"not_analyzed"
          },
          "nearest_ref_id"
:
{
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_id"
:
{
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_short_name"
:
{
            "type": "string",
            "index":"not_analyzed"
          }
        }
      }
    }
}
  


  
  
  And then re-indexing of all
   

Re: I can't find anything after hypens or underscores

2015-01-21 Thread Alessandro Bonfanti

  
  
Il 02/12/2014 09:21, Alessandro
  Bonfanti ha scritto:


  
  Il 12/11/2014 17:43, Alessandro
Bonfanti ha scritto:
  
  

Il 12/11/2014 17:20, Nikolas
  Everett ha scritto:


  

  On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti 
wrote:

  
Il 12/11/2014 15:25, Nikolas Everett ha
  scritto:


  

  

  On Wed, Nov 12,
2014 at 8:15 AM, Alessandro Bonfanti 
wrote:

  Hi, I'm very newbie on
ElasticSearch. 
I'm try to indexing a set of
biological data. There are some
fields like 'gene_id' or
'gene_shortname' that should be
processed as literal strings.
When I try to search for 'ZNF6092'
in a field filled with
'linc-ZNF6092-6', I can't find
anything. When I search for 'linc' I
find correct document elsewhere.
It seems that this is a problem with
ES analyzer, but I tried to set it
for do not analyze fields, but it
seems that nothing changes.
I try with:


curl

  -XPOST

  'localhost:9200/a3' -d @tracking_map.json
  
  

where tracking_map.json is


{
  "mappings": {
    "tracking": {
      "properties": {
        "tracking_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "nearest_ref_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_short_name" : {
          "type": "string",
          "index":"not_analyzed"
        }
      }
    }
  }
  }

  
  


And then re-indexing of all
documents. I failed, but where?
Thanks in advance,

Alessandro
  
   


Its an analyzer problem,
  certainly.  You've turned off
  analyzers with
  "index":"not_analazyed".  What you
  probably want is for the
  gene_short_name to be analy

Re: I can't find anything after hypens or underscores

2014-12-02 Thread Alessandro Bonfanti

  
  
Il 12/11/2014 17:43, Alessandro
  Bonfanti ha scritto:


  
  Il 12/11/2014 17:20, Nikolas Everett
ha scritto:
  
  

  
On Wed, Nov 12, 2014 at 11:13 AM,
  Alessandro Bonfanti 
  wrote:
  

  Il 12/11/2014 15:25, Nikolas Everett ha scritto:
  
  

  

  
On Wed, Nov 12,
  2014 at 8:15 AM, Alessandro Bonfanti 
  wrote:
  
Hi, I'm very newbie on
  ElasticSearch. 
  I'm try to indexing a set of
  biological data. There are some fields
  like 'gene_id' or 'gene_shortname'
  that should be processed as literal
  strings.
  When I try to search for 'ZNF6092' in
  a field filled with 'linc-ZNF6092-6',
  I can't find anything. When I search
  for 'linc' I find correct document
  elsewhere.
  It seems that this is a problem with
  ES analyzer, but I tried to set it for
  do not analyze fields, but it seems
  that nothing changes.
  I try with:
  
  
  curl
-XPOST
'localhost:9200/a3' -d @tracking_map.json


  
  where tracking_map.json is
  
  
  {
    "mappings": {
      "tracking": {
        "properties": {
          "tracking_id" : {
            "type": "string",
            "index":"not_analyzed"
          },
          "nearest_ref_id" : {
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_id" : {
            "type": "string",
            "index":"not_analyzed"
          },
          "gene_short_name" : {
            "type": "string",
            "index":"not_analyzed"
          }
        }
      }
    }
}
  


  
  
  And then re-indexing of all documents.
  I failed, but where?
  Thanks in advance,
  
  Alessandro

 
  
  
  Its an analyzer problem, certainly. 
You've turned off analyzers with
"index":"not_analazyed".  What you
probably want is for the gene_short_name
to be analyzed so that dashes are
considered "word separators".  If you do
that you can find linc-ZNF6092-6 by
performing a simple_query_string (or
match) search for
ZNF6092 or

Re: I can't find anything after hypens or underscores

2014-11-12 Thread Alessandro Bonfanti

  
  
Il 12/11/2014 17:20, Nikolas Everett ha
  scritto:


  

  On Wed, Nov 12, 2014 at 11:13 AM,
Alessandro Bonfanti  wrote:

  
Il 12/11/2014 15:25, Nikolas Everett ha scritto:


  

  

  On Wed, Nov 12, 2014
at 8:15 AM, Alessandro Bonfanti 
wrote:

  Hi, I'm very newbie on
ElasticSearch. 
I'm try to indexing a set of biological
data. There are some fields like
'gene_id' or 'gene_shortname' that
should be processed as literal strings.
When I try to search for 'ZNF6092' in a
field filled with 'linc-ZNF6092-6', I
can't find anything. When I search for
'linc' I find correct document
elsewhere.
It seems that this is a problem with ES
analyzer, but I tried to set it for do
not analyze fields, but it seems that
nothing changes.
I try with:


curl
  -XPOST 'localhost:9200/a3' -d @tracking_map.json
  
  

where tracking_map.json is


{
  "mappings": {
    "tracking": {
      "properties": {
        "tracking_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "nearest_ref_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_short_name" : {
          "type": "string",
          "index":"not_analyzed"
        }
      }
    }
  }
  }

  
  


And then re-indexing of all documents. I
failed, but where?
Thanks in advance,

Alessandro
  
   


Its an analyzer problem, certainly. 
  You've turned off analyzers with
  "index":"not_analazyed".  What you
  probably want is for the gene_short_name
  to be analyzed so that dashes are
  considered "word separators".  If you do
  that you can find linc-ZNF6092-6 by
  performing a simple_query_string (or
  match) search for
  ZNF6092 or
  ZNF6092 6 or
  6 or
  linc.   Have a
  look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
  and go from there.  You may also want to
  us

Re: I can't find anything after hypens or underscores

2014-11-12 Thread Nikolas Everett
On Wed, Nov 12, 2014 at 11:13 AM, Alessandro Bonfanti 
wrote:

>  Il 12/11/2014 15:25, Nikolas Everett ha scritto:
>
>
>
> On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti 
> wrote:
>
>> Hi, I'm very newbie on ElasticSearch.
>> I'm try to indexing a set of biological data. There are some fields like
>> 'gene_id' or 'gene_shortname' that should be processed as literal strings.
>> When I try to search for 'ZNF6092' in a field filled with
>> 'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
>> correct document elsewhere.
>> It seems that this is a problem with ES analyzer, but I tried to set it
>> for do not analyze fields, but it seems that nothing changes.
>> I try with:
>>
>>  curl -XPOST 'localhost:9200/a3' -d @tracking_map.json
>>
>> where tracking_map.json is
>>
>>  {
>>   "mappings": {
>> "tracking": {
>>   "properties": {
>> "tracking_id" : {
>>   "type": "string",
>>   "index":"not_analyzed"
>> },
>> "nearest_ref_id" : {
>>   "type": "string",
>>   "index":"not_analyzed"
>> },
>> "gene_id" : {
>>   "type": "string",
>>   "index":"not_analyzed"
>> },
>> "gene_short_name" : {
>>   "type": "string",
>>   "index":"not_analyzed"
>> }
>>   }
>> }
>>   }
>> }
>>
>>
>>
>> And then re-indexing of all documents. I failed, but where?
>> Thanks in advance,
>>
>> Alessandro
>>
>
>  Its an analyzer problem, certainly.  You've turned off analyzers with
> "index":"not_analazyed".  What you probably want is for the gene_short_name
> to be analyzed so that dashes are considered "word separators".  If you do
> that you can find linc-ZNF6092-6 by performing a simple_query_string (or
> match) search for ZNF6092 or ZNF6092 6 or
> 6 or linc.   Have a look at
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
> and go from there.  You may also want to use a lowercase filter so you can
> search for znf6092 and still find it.
>
>  This is a good read on how to change the mapping as well:
> http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
>  even if you don't need all the information in there it is nice to know.
>
> Nik
>   --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
> Very thanks for your answer,
> What I want is that ES store fields as literals, so I should find ZNF6092
> with a wilcard search (*ZNF6092* for example).
> I tried set "pattern" to "*" for testing (* isn't in gene_shortname, so I
> suppose that entire string is stored. But anyway I still find nothing.
>
>
You'd have to post your queries for me to help more but in general if best
to analyze the content up front and perform basic match queries without
wildcards than it is to search with wildcards.  Wildcards are way way way
slower.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0itbdHQ-maOuOmrrYf2QCqMFORTG21QpFHOCrp9E0rmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: I can't find anything after hypens or underscores

2014-11-12 Thread Alessandro Bonfanti

  
  
Il 12/11/2014 15:25, Nikolas Everett ha
  scritto:


  

  On Wed, Nov 12, 2014 at 8:15 AM,
Alessandro Bonfanti  wrote:

  Hi, I'm very newbie on ElasticSearch. 
I'm try to indexing a set of biological data. There are
some fields like 'gene_id' or 'gene_shortname' that
should be processed as literal strings.
When I try to search for 'ZNF6092' in a field filled
with 'linc-ZNF6092-6', I can't find anything. When I
search for 'linc' I find correct document elsewhere.
It seems that this is a problem with ES analyzer, but I
tried to set it for do not analyze fields, but it seems
that nothing changes.
I try with:


curl -XPOST 'localhost:9200/a3' -d @tracking_map.json
  
  

where tracking_map.json is


{
  "mappings": {
    "tracking": {
      "properties": {
        "tracking_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "nearest_ref_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_id" : {
          "type": "string",
          "index":"not_analyzed"
        },
        "gene_short_name" : {
          "type": "string",
          "index":"not_analyzed"
        }
      }
    }
  }
  }

  
  


And then re-indexing of all documents. I failed, but
where?
Thanks in advance,

Alessandro
  
  



Its an analyzer problem, certainly.  You've turned off
  analyzers with "index":"not_analazyed".  What you probably
  want is for the gene_short_name to be analyzed so that
  dashes are considered "word separators".  If you do that
  you can find linc-ZNF6092-6 by performing a
  simple_query_string (or match) search for
  ZNF6092 or ZNF6092
  6 or 6 or
  linc.   Have a look at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
  and go from there.  You may also want to use a lowercase
  filter so you can search for
  znf6092 and still find it.
  

This is a good read on how to change the mapping as
  well:
  http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/

even if you don't need all the information in there it
  is nice to know.
  
  Nik

  

  
  -- 
  You received this message because you are subscribed to a topic in
  the Google Groups "elasticsearch" group.
  To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/Y6I2qNZxR-s/unsubscribe.
  To unsubscribe from this group and all its topics, send an email
  to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com.
  For more options, visit https://groups.google.com/d/optout.

Very thanks for your answer,
What I want is that ES store fields as literals, so I should find
ZNF6092 with a wilcard search (*ZNF6092* for example).
I tried set "pattern" to "*" for testing (* isn't in gene_shortname,
so I suppose that entire string is stored. But anyway I still find
nothing.

  




-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5463873D.1070507%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: I can't find anything after hypens or underscores

2014-11-12 Thread Nikolas Everett
On Wed, Nov 12, 2014 at 8:15 AM, Alessandro Bonfanti 
wrote:

> Hi, I'm very newbie on ElasticSearch.
> I'm try to indexing a set of biological data. There are some fields like
> 'gene_id' or 'gene_shortname' that should be processed as literal strings.
> When I try to search for 'ZNF6092' in a field filled with
> 'linc-ZNF6092-6', I can't find anything. When I search for 'linc' I find
> correct document elsewhere.
> It seems that this is a problem with ES analyzer, but I tried to set it
> for do not analyze fields, but it seems that nothing changes.
> I try with:
>
> curl -XPOST 'localhost:9200/a3' -d @tracking_map.json
>
> where tracking_map.json is
>
> {
>   "mappings": {
> "tracking": {
>   "properties": {
> "tracking_id" : {
>   "type": "string",
>   "index":"not_analyzed"
> },
> "nearest_ref_id" : {
>   "type": "string",
>   "index":"not_analyzed"
> },
> "gene_id" : {
>   "type": "string",
>   "index":"not_analyzed"
> },
> "gene_short_name" : {
>   "type": "string",
>   "index":"not_analyzed"
> }
>   }
> }
>   }
> }
>
>
>
> And then re-indexing of all documents. I failed, but where?
> Thanks in advance,
>
> Alessandro
>
>
Its an analyzer problem, certainly.  You've turned off analyzers with
"index":"not_analazyed".  What you probably want is for the gene_short_name
to be analyzed so that dashes are considered "word separators".  If you do
that you can find linc-ZNF6092-6 by performing a simple_query_string (or
match) search for ZNF6092 or ZNF6092 6 or
6 or linc.   Have a look at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-tokenizer.html
and go from there.  You may also want to use a lowercase filter so you can
search for znf6092 and still find it.

This is a good read on how to change the mapping as well:
http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/
even if you don't need all the information in there it is nice to know.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd06sKTVS6JC8q7x7R37gUEnsHEiuar0-yy_ZdOJQhKYzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.