Re: memory problem ( [WARN ][monitor.jvm] .... ) ?

2014-07-29 Thread Tanguy Bernard
I found the solution of my problem. 
My note_source contained base64 picture and my nGram is too large, so the 
indexing takes a lot of time. Conclusion : I got [WARN ][monitor.jvm].

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/98c74acb-45b9-4a0c-9d57-2e694d8ea76e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error for elasticsearch php api

2014-07-24 Thread Tanguy Bernard

>
> My answer is very late, but here :
>>
>  

> >
>> require '../vendor/autoload.php';
>> 
>> $client = new Elasticsearch\Client();
>> 
>> $params = array();
>> $params['body']  = array('testField' => 'abc');
>> 
>> $params['index'] = 'my_index';
>> $params['type']  = 'my_type';
>> $params['id']= 'my_id';
>> 
>> // Document will be indexed to my_index/my_type/my_id
>> $ret = $client->index($params);
>> 
>> /**
>>  *SEARCH
>>  **/
>> $params['index'] = 'my_index';
>> $params['type']  = 'my_type';
>> 
>>
>*//HERE your params = 
array('index'=>my_index,'type'=>'my_type',body=>array(test=>abc))  *
*  //Your body is incorrect for search*
* //I suggest this if you want find result of a specific id :*

 
*  $params=array(); //reset first*
*  $params = **array(*
*'index' => 'my_index',*
*'type'=>'my_type',*
*"body"=>array(*
*'query'=> array(*
*"match" =>array( *
*"_id"=> 'my_id',*
*)*
*)*
*),*
  
*);*

$results = $client->search($params);
>> 
>> ?>
>>
>>
>

Tanguy 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c322d0c5-1564-48eb-a307-b5fc6393723e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: index doc, pdf, odt .... => cluster : yellow, why ?

2014-07-10 Thread Tanguy Bernard
Thank you Rafał Kuć.  Really interesting.

Tanguy

Le jeudi 10 juillet 2014 16:30:25 UTC+2, Rafał Kuć a écrit :
>
>  Hello!
>
> Multiple nodes for high availability - everything crashes, and you don't 
> want to loose Elasticsearch and the ability to search and analyze your 
> data. Also for better performance - you can have your index built of many 
> shards that are spread across multiple nodes and have replicas of that 
> shards, so you can physical copies that can handle the traffic if one node 
> is not able to handle it. 
>
>
>
>
>
> *-- Regards, Rafał Kuć Performance Monitoring * Log Analytics * Search 
> Analytics Solr & Elasticsearch Support * *http://sematext.com/
>
>
>  
>  Thank you.
>
> In my case I have many client, I use one index by client and a single node.
> I work with elasticsearch recently and for the moment I don't understand 
> why to use a second node ? 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a7920608-37ce-4f17-b6f7-4c9b2806dacb%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.  
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/981734e9-1d3f-4a42-ad63-e637fe373e7d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: index doc, pdf, odt .... => cluster : yellow, why ?

2014-07-10 Thread Tanguy Bernard
Thank you.

In my case I have many client, I use one index by client and a single node.
I work with elasticsearch recently and for the moment I don't understand 
why to use a second node ? 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7920608-37ce-4f17-b6f7-4c9b2806dacb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: index doc, pdf, odt .... => cluster : yellow, why ?

2014-07-10 Thread Tanguy Bernard
Health of cluster: yellow (35 50)

curl -XGET 'localhost:9200/_cluster/health?pretty'
{
   "cluster_name": "elasticsearch",
   "status": "yellow",
   "timed_out": false,
   "number_of_nodes": 1,
   "number_of_data_nodes": 1,
   "active_primary_shards": 35,
   "active_shards": 35,
   "relocating_shards": 0,
   "initializing_shards": 0,
   "unassigned_shards": 15
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7d49cbb-687f-4eac-8b54-9b5b24c886db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


index doc, pdf, odt .... => cluster : yellow, why ?

2014-07-10 Thread Tanguy Bernard
Hello, this is my code when I index ( I use this because I want to make a 
search inside a document (on this content))

*PUT test*

*PUT test/my_type/_mapping*
*{*
*"my_type" : {*
*"properties" : {*
*"my_file" : {*
*"type" : "attachment",*
*"fields" : {*
*"my_file" : { "term_vector":"with_positions_offsets"}*
*}*
*}*
*}*
*}*
*}*


I open my doc, I convert it in base64 and I index it.

*PUT test/my_type/1*
*{*
*   "my_file" : "my file in base64",*
*   "name": "the name of file"*

*}*

And I search :

*GET test/my_etape/_search?pretty=true*
*{"size": 50,*
*"query": {*
*"query_string": {*
   
*   "query": "my keywords"*
   
*}*
*},"highlight": {"fields": {"my_file":{"**term_vector**" : "*
*with_positions_offsets**"}*
*}*
*}*
*}*

I don't understand why my cluster turn yellow ?
Have you an explanation for me or an other way to do this, please ?

Thank to you in advance.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7ec2914-0498-414d-96a1-ccffeff93148%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: have we a way to use highlight and fuzzy together ?

2014-07-07 Thread Tanguy Bernard
I want to combine like this :

GET my_index/my_type/_search?pretty=true
{"size": 50,

"query": {
 
"multi_match": {
"query":  "my words",
   
"fields": ["title_doc"]

},*"fuzzy": 0.2*
}, 
*"highlight" : {*
   
*"fields" : {*
*"title_doc" : { "fragment_size" : 30**}*
*}*
*},*
"sort": [
   {
  "date_source": {
 "order": "desc"
  }
   }
]
   

}

Can you help me ?

Tanguy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/af25f933-be1e-4df5-9c15-ab494b5d31b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: have we a way to use highlight and fuzzy together ?

2014-07-07 Thread Tanguy Bernard
I want to combine like this :

GET my_index/my_type/_search?pretty=true
{"size": 50,

"query": {
*"fuzzy": 0.2,* 
"multi_match": {
"query":  "my words",
   
"fields": ["title_doc"]

}
}, 
*"highlight" : {*
*"order" : "date_doc",*
*"fields" : {*
*"title_doc" : { "fragment_size" : 30**}*
*}*
*}*
   


}

Can you help me ?

Tanguy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8e56e849-ca0a-4c3c-9183-c928d4236e77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Tanguy Bernard
Yes, it's just some date. I think that it can be update quickly. It's the 
better way :)
Thank you all.

Le mercredi 2 juillet 2014 12:56:59 UTC+2, David Pilato a écrit :
>
> I would recommend updating the SQL database! :)
>
> So may be update all dates where date is -00-00 to 1970-01-01 if it 
> fits with your use case.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr>
>
>
> Le 2 juillet 2014 à 12:54:36, Tanguy Bernard (bernardt...@gmail.com 
> ) a écrit:
>
> This date is created when a document is created, but an error occur and I 
> have this -00-00 ^^ 
> I'm in company while exist since 10 years, the database is old and they 
> are this kind of error.
>
> For the moment, I will use :
>
> "sql" : "select id_source as _id, title_source, date_source from source",  
> *// 
> if I add this "where date_source not like '%%'", it's work but values 
> miss for this date*
> Or not index date_source. My goal was to sort my result with date_source.
>
> Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit : 
>>
>>  What this date is supposed to represent? 
>>  month = 0 or day = 0 does not exist, right? 
>>
>>  -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
>>  @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
>> <https://twitter.com/elasticsearchfr>
>>
>   --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bb21d146-0094-4765-baf5-9232977ad4e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Tanguy Bernard
This date is created when a document is created, but an error occur and I 
have this -00-00 ^^
I'm in company while exist since 10 years, the database is old and they are 
this kind of error.

For the moment, I will use :

"sql" : "select id_source as _id, title_source, date_source from source",  *// 
if I add this "where date_source not like '%%'", it's work but values 
miss for this date*
Or not index date_source. My goal was to sort my result with date_source.

Le mercredi 2 juillet 2014 12:40:58 UTC+2, David Pilato a écrit :
>
> What this date is supposed to represent? 
> month = 0 or day = 0 does not exist, right? 
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet  | @elasticsearchfr 
> 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6eca7137-875f-47e3-8719-537ed5ad0310%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


have we a way to use highlight and fuzzy together ?

2014-07-02 Thread Tanguy Bernard
Hello
Everything is on subject
I have to use fuzzy for my fileds (title,content) and when I'm searching I 
want to see a part of the sentance where my keyword is.

This, together, doesn't work:
$params['body']['highlight']['fields'][$value]['fragment_size']=30;
$params['body']['query']['fuzzy']=0.2;

Have we a way to use highlight and fuzzy together or an other way 
equivalent ?


Thank to you in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6628cb4-7e2c-4e21-b578-a14865450a83%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Tanguy Bernard
In my mysql table (type : datetime) :

| date_source |
+-+
| 2008-09-15 18:29:07 |
| 2013-08-29 00:00:00 |
| 2013-07-04 00:00:00 |
| 2013-07-17 00:00:00 |
| 2013-07-17 00:00:00 |
| -00-00 00:00:00 |
...
If I use a mapping (type :string)

And I index :

PUT /_river/test/_meta
{
"type" : "jdbc",
"jdbc" : {

"url" : "jdbc:mysql://ip:port/database",
"user" : "user",
"password" : "password",
"sql" : "select id_source as _id, title_source, date_source from 
source",  *// if I add this "where date_source not like '%%'", it's 
work but values miss for this date*
"index" : "test",
"type" : "source",
"max_bulk_requests" : 5  


}}




Le mercredi 2 juillet 2014 12:09:58 UTC+2, vineeth mohan a écrit :
>
> Hi Tanguy ,
>
> How is this a valid date string - "java.io.IOException: 
> java.sql.SQLException: Value '7918-00-00 00:00:00  " ?
> This value cant be mapped to any date format or is valid in anyway.
>
> Thanks
>  Vineth
>
>
>
>
> On Wed, Jul 2, 2014 at 3:21 PM, Tanguy Bernard  > wrote:
>
>> As made, when I index date -00-00 00:00:00 the indexing stop 
>> completly with an error. (the begin work and stop instantly) 
>> I have tried to put (mapping) the type : string to my date but it doesn't 
>> work
>>
>> Have you an idea to solve my problem ?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1fc6c18b-a192-4972-92b6-9210be9c46aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Tanguy Bernard
As made, when I index date -00-00 00:00:00 the indexing stop completly 
with an error. (the begin work and stop instantly) 
I have tried to put (mapping) the type : string to my date but it doesn't 
work

Have you an idea to solve my problem ?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/57d83657-a032-4ed8-8236-143a8e44c5fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem index date yyyy-MM-dd’T'HH:mm:ss.SSS

2014-07-02 Thread Tanguy Bernard
Hello,

I try to indexing datetime mysql like this : 2013-05-01 00:00:00 
In ES it's represented like this : 2013-05-01T00:00:00.000Z
The real problem seems to be when I index this date : -00-00 00:00:00

I have used this mapping :

"type":"date",
 "format":"-MM-dd HH:mm:ss||MM/dd/||/MM/dd",
 "index":"not_analyzed"


I have obtained this error :

[2014-07-02 10:11:56,503][INFO ][cluster.metadata ] [ik-test2] 
[_river] update_mapping [source] (dynamic)
can not be represented as java.sql.Timestamp
java.io.IOException: java.sql.SQLException: Value '7918-00-00 00:00:00 
 ...

 can not be represented as java.sql.Timestamp
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1078)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:975)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:920)
at 
com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1102)
at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:576)
at 
com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6592)
at 
com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:6192)
at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5058)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.processRow(SimpleRiverSource.java:590)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.nextRow(SimpleRiverSource.java:565)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.merge(SimpleRiverSource.java:356)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:257)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.fetch(SimpleRiverSource.java:228)
... 3 more
[2014-07-02 10:11:56,633][WARN 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverFlow] 
aborting river
[2014-07-02 10:12:01,392][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [1] of [69 items], 1 outstanding bulk requests
[2014-07-02 10:12:01,437][INFO ][cluster.metadata ] [ik-test2] 
[my_index] update_mapping [source] (dynamic)



Can you help me, with my problem ?

Thank to you in advance.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a572cbaa-5304-480d-9fc1-2e1783c36cea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


highlight, number_of_fragments seem does not work

2014-06-30 Thread Tanguy Bernard
Hello,
I'm using this elasticsearch client 
: https://github.com/nervetattoo/elasticsearch
And I have no "highlight" result when I search :

$params = array(
"index" =>"my_index",
"type" => "my_type",
"body" => array(
"size"=>30,
'query'  => array(
'multi_match'  => array(
'fields' => array('title','description')
'query' => keywords,
),
),
)
);

   * 
$params['body']['highlight']['fields'][$value]['number_of_fragments']=20; *// 
not 
work, I don't know why ?


$results = $this->elasticsearchClient->search($params);


Can you help me ?
Thank you in advance.


Tanguy


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4f40556-f790-4760-80f4-37f02829ad8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Rivers are reimporting data at each ElasticSearch restart

2014-06-25 Thread Tanguy Bernard
Hello,
This post interested me.
Have we a way to know when  indexing is finished and thus triggered the 
XDELETE _river?

Le mercredi 25 juin 2014 17:54:01 UTC+2, Jörg Prante a écrit :
>
> It is up to the river implementation how the data import is handled.
>
> The JDBC river, in the "simple" strategy, imports data when the river is 
> started, regardless of existing cluster or index. It is possible to 
> implement other strategies, for example, a strategy that performs a check 
> before indexing.
>
> There is no support for river implementations about node start/stop 
> control and how to behave. JDBC river tries to compensate this by 
> persisting a JDBC river specific state. This state is useful for flow 
> control.
>
> If you do no longer need the river, you can delete the river with curl 
> -XDELETE, this shuts down river instance threads gracefully and releases 
> resources.
>
> If you delete the _river index with curl -XDELETE, you wipe all data that 
> is used by rivers. Active river instances are not stopped and are not aware 
> of what happened, so this is an unfriendly way to terminate river runs, all 
> kind of river errors may occur.
>
> Jörg
>
>
>
> On Wed, Jun 25, 2014 at 5:38 PM, Stéphane Seng  > wrote:
>
>> Hello,
>>
>> I have a question about the fact that, when rivers are used to import 
>> data into ElasticSearch, rivers are also reimporting data at each 
>> ElasticSearch restart.
>>
>> In our project, what we are doing is as follows :
>>
>>- Raw data is imported into ElasticSearch from a MySQL database using 
>>the JDBC river (https://github.com/jprante/elasticsearch-river-jdbc); 
>>- Some updates are executed directly on the newly imported data in 
>>ElasticSearch using POST requests;
>>- In the end, the final data stored in ElasticSearch is not the same 
>>than the imported raw data.
>>
>> The problem we are facing is that when ElasticSearch is restarted, the 
>> JDBC river is reimporting the raw data thus overriding the transformations 
>> made.
>> We suppose that this is an intentional behavior from ElasticSearch rivers.
>> One solution to avoid the reimporting of data is to delete the 
>> corresponding _river index, which is supposed to store the state of the 
>> rivers.
>>
>> Our questions are as follows :
>>
>>- Is the reimporting of data from rivers at each restart is a 
>>standard use case ? Is it useful for some applications ?
>>- What is the point of the _river index state saving ? 
>>   - Is there a way to avoid the reimporting of data without having 
>>   to delete the corresponding _river index ?
>>   - Is there any downsides (for our use case) to delete the 
>>   corresponding _river index ?
>>   
>> Thanks,
>> Stéphane.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a59ade79-e474-466b-bf54-1476a7c506bb%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2b7f91f1-4fa0-4e66-8193-cd0e6fa35982%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem date error 0000-00-00

2014-06-25 Thread Tanguy Bernard
Hello,

I have a problem with date format, when I'm indexing my SQL table, I have 
this :
 -00-00' can not be represented as java.sql.Date

I have tried this for my mapping :

my date = array(
 'type' => 'date',
"format"=>"date",
'null_value'=>'-00-00',
);

but this did not work !

Can you help me to find an answer ?

Thank you in advance.

Tanguy

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b520d277-72f5-4291-ba8c-938cae48e327%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-25 Thread Tanguy Bernard
Yes I did not know how nGram works !
I find a perfect solution for my picture (base64) problem : use *'char_filter' 
=>array('html_strip'),*


public function createSetting($pf){
$params = array('index' => $pf, 'body' => array(
'settings' => array(
'number_of_shards' => 5,
'number_of_replicas' => 0,
'analysis' => array(
'filter' => array(
'MYnGram' => array(
"token_chars" =>array(),
"type" => "nGram",
"min_gram" => 3,
"max_gram"  => 20
)
),
'analyzer' => array(
'reuters' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('lowercase', 'asciifolding', 
'MYnGram'),
'char_filter' =>array('html_strip'),
),

)
)
)
));
$this->elasticsearchClient->indices()->create($params);

   }

Thanks to all of you !


Le samedi 21 juin 2014 00:35:39 UTC+2, Clinton Gormley a écrit :
>
> You seriously don't want 3..250 length ngrams That's ENORMOUS
>
> Typically set min/max to 3 or 4, and that's it
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching
>
>
> On 20 June 2014 16:05, Tanguy Bernard 
> > wrote:
>
>> Thank you Cédric Hourcade !
>>
>> Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :
>>
>>> If your base64 encodes are long, they are going to be splited in a lot 
>>> of tokens by the standard tokenizer. 
>>>
>>> Theses tokens are often going to be a lot longer than standard words, 
>>> so your nGram filter will generate even more tokens, a lot more than 
>>> with standard text. That may be your problem there. 
>>>
>>> You should really try to strip the encoded images with a simple regex 
>>> from your documents before indexing them. If you need to keep the 
>>> source, put the raw text in an unindexed field, and the cleaned one in 
>>> another. 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2bdd5f30-8e97-43e0-8478-08cc26a03ed9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Thank you Cédric Hourcade !

Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :
>
> If your base64 encodes are long, they are going to be splited in a lot 
> of tokens by the standard tokenizer. 
>
> Theses tokens are often going to be a lot longer than standard words, 
> so your nGram filter will generate even more tokens, a lot more than 
> with standard text. That may be your problem there. 
>
> You should really try to strip the encoded images with a simple regex 
> from your documents before indexing them. If you need to keep the 
> source, put the raw text in an unindexed field, and the cleaned one in 
> another. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
The user copy/paste the content of an html page and me, I index this 
information. I take the entire document with image. I can't change this 
behavior.

I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]->[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]->[22.3mb]/[66.5mb]}{[survivor] [14.9kb]->[49.3kb]/[8.3mb]}{[old] 
[513.4mb]->[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?

Le vendredi 20 juin 2014 11:45:22 UTC+2, Cédric Hourcade a écrit :
>
> If you are only searching in the text you should index the images in 
> an other field field. With no analyzer ("index: not_analyzed"), or 
> even better "index: no" (not indexed). If you need to retrieve the 
> image data it's still in the _source. 
>
> But to be honest I wouldn't even store this kind of information in ES, 
> your index is going to be bigger, merges are going to be slower... I'd 
> keep the binary files stored elsewhere. 
>
> Cédric Hourcade 
> c...@wal.fr  
>
>
> On Fri, Jun 20, 2014 at 11:25 AM, Tanguy Bernard 
> > wrote: 
> > Yes, I am applying "reuters" on my document (compose by text and 
> picture). 
> > My goal is to do my research on the text of the document with any word 
> or 
> > part of a word. 
> > 
> > Yes the problem it's my nGram filter. 
> > How do I solve this problem ? Deacrease nGram max ? Change Analyzer by 
> an 
> > other but who satisfy my goal ? 
> > 
> > Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit : 
> >> 
> >> Does it mean your applying the "reuters" analyzer on your base64 
> >> encoded pictures? 
> >> 
> >> I guess it generates a really huge number of tokens for each entry 
> >> because of your nGram filter (with a max at 250). 
> >> 
> >> Cédric Hourcade 
> >> c...@wal.fr 
> >> 
> >> 
> >> On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
> >>  wrote: 
> >> > Information 
> >> > My "note_source" contain picture (.jpg, .png ...) in base64 and text. 
> >> > 
> >> > For my mapping I have used : 
> >> > "type" => "string" 
> >> > "analyzer" => "reuteurs" (the name of my analyzer) 
> >> > 
> >> > 
> >> > Any idea ? 
> >> > 
> >> > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
> >> >> 
> >> >> Hello 
> >> >> I have some issue, when I index a particular data "note_source" (sql 
> >> >> longtext). 
> >> >> I use the same analyzer for each fields (except date_source and 
> >> >> id_source) 
> >> >> but for "note_source", I have a "warn monitor.jvm". 
> >> >> When I remove "note_source", everything fine. If I don't use 
> analyzer 
> >> >> on 
> >> >> "note_source", everything fine, but if I use my analyzer on 
> >> >> "note_source" I 
> >> >> have some crash. 
> >> >> 
> >> >> I think I have enough memory, I have used ES_HEAP_SIZE. 
> >> >> Maybe my problem it's with accent (ascii, utf-8) 
> >> >> 
> >> >> Can you help me with this ? 
> >> >> 
> >> >> 
> >> >> 
> >> >> My Setting 
> >> >> 
> >> >>  public function createSetting($pf){ 
> >> >> $params = array('index' => $pf, 'body' => array( 
> >> >> 'settings' => array( 
> >> >> 'number_of_shards' => 5, 
> >> >> 'number_of_replicas' => 0, 
> >> >> 'analysis' => array( 
> >> >> 'filter' => array( 
> >> >> 'nGram' => array( 
> >> >> "token_chars" =>array(), 
> >> >> "type" => "nGram", 
> >> >> "min_gram" => 3, 
> >> >> "max_gram"  => 250 
> >> >> ) 
> >> 

Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
I set max_gram=20. It's better but at the end I have this many times :

[2014-06-20 11:42:14,201][WARN ][monitor.jvm  ] [ik-test2] 
[gc][young][528][263] duration [2s], collections [1]/[2.1s], total 
[2s]/[43.9s], memory [536mb]->[580.2mb]/[1015.6mb], all_pools {[young] 
[22.5mb]->[22.3mb]/[66.5mb]}{[survivor] [14.9kb]->[49.3kb]/[8.3mb]}{[old] 
[513.4mb]->[557.8mb]/[940.8mb]}

I put ES_HEAP_SIZE : 2G. I think it's enough.
Something wrong ?


Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :
>
> Hello
> I have some issue, when I index a particular data "note_source" (sql 
> longtext).
> I use the same analyzer for each fields (except date_source and id_source) 
> but for "note_source", I have a "warn monitor.jvm".
> When I remove "note_source", everything fine. If I don't use analyzer on 
> "note_source", everything fine, but if I use my analyzer on "note_source" I 
> have some crash.
>
> I think I have enough memory, I have used ES_HEAP_SIZE.
> Maybe my problem it's with accent (ascii, utf-8)
>
> Can you help me with this ?
>
>
>
> *My Setting*
>
>  public function createSetting($pf){
> $params = array('index' => $pf, 'body' => array(
> 'settings' => array(
> 'number_of_shards' => 5,
> 'number_of_replicas' => 0,
> 'analysis' => array(
> 'filter' => array(
> 'nGram' => array(
> "token_chars" =>array(),
> "type" => "nGram",
> "min_gram" => 3,
> "max_gram"  => 250
> )
> ),
> 'analyzer' => array(
> 'reuters' => array(
> 'type' => 'custom',
> 'tokenizer' => 'standard',
> 'filter' => array('lowercase', 'asciifolding', 
> 'nGram')
> )
> )
> )
> )
> ));
> $this->elasticsearchClient->indices()->create($params);
> return;
> }
>
>
> *My Indexing*
>
> public function indexTable($pf,$typeElement){
>
> $params =array(
> "index" =>'_river', 
> "type" => $typeElement, 
> "id" => "_meta", 
> "body" =>array(
>   
> "type" => "jdbc",
> "jdbc" => array(
> "url" => "jdbc:mysql://ip/name",
> "user" => 'root',
> "password" => 'mdp',
> "index" => $pf,
> "type" => $typeElement,
> "sql" => select id_source as _id, id_sous_theme, 
> titre_source, desc_source, note_source, adresse_source, type_source, 
> date_source from source,
> "max_bulk_requests" => 5,  
> )
> )
> 
> );
> 
>  
> $this->elasticsearchClient->index($params);
> }
>
> Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/154b8ca2-a130-4062-b5ce-0e0fa63d98fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Yes, I am applying "reuters" on my document (compose by text and picture).
My goal is to do my research on the text of the document with any word or 
part of a word.

Yes the problem it's my nGram filter.
How do I solve this problem ? Deacrease nGram max ? Change Analyzer by an 
other but who satisfy my goal ?

Le vendredi 20 juin 2014 10:58:49 UTC+2, Cédric Hourcade a écrit :
>
> Does it mean your applying the "reuters" analyzer on your base64 
> encoded pictures? 
>
> I guess it generates a really huge number of tokens for each entry 
> because of your nGram filter (with a max at 250). 
>
> Cédric Hourcade 
> c...@wal.fr  
>
>
> On Fri, Jun 20, 2014 at 9:09 AM, Tanguy Bernard 
> > wrote: 
> > Information 
> > My "note_source" contain picture (.jpg, .png ...) in base64 and text. 
> > 
> > For my mapping I have used : 
> > "type" => "string" 
> > "analyzer" => "reuteurs" (the name of my analyzer) 
> > 
> > 
> > Any idea ? 
> > 
> > Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit : 
> >> 
> >> Hello 
> >> I have some issue, when I index a particular data "note_source" (sql 
> >> longtext). 
> >> I use the same analyzer for each fields (except date_source and 
> id_source) 
> >> but for "note_source", I have a "warn monitor.jvm". 
> >> When I remove "note_source", everything fine. If I don't use analyzer 
> on 
> >> "note_source", everything fine, but if I use my analyzer on 
> "note_source" I 
> >> have some crash. 
> >> 
> >> I think I have enough memory, I have used ES_HEAP_SIZE. 
> >> Maybe my problem it's with accent (ascii, utf-8) 
> >> 
> >> Can you help me with this ? 
> >> 
> >> 
> >> 
> >> My Setting 
> >> 
> >>  public function createSetting($pf){ 
> >> $params = array('index' => $pf, 'body' => array( 
> >> 'settings' => array( 
> >> 'number_of_shards' => 5, 
> >> 'number_of_replicas' => 0, 
> >> 'analysis' => array( 
> >> 'filter' => array( 
> >> 'nGram' => array( 
> >> "token_chars" =>array(), 
> >> "type" => "nGram", 
> >> "min_gram" => 3, 
> >> "max_gram"  => 250 
> >> ) 
> >> ), 
> >> 'analyzer' => array( 
> >> 'reuters' => array( 
> >> 'type' => 'custom', 
> >> 'tokenizer' => 'standard', 
> >> 'filter' => array('lowercase', 'asciifolding', 
> >> 'nGram') 
> >> ) 
> >> ) 
> >> ) 
> >> ) 
> >> )); 
> >> $this->elasticsearchClient->indices()->create($params); 
> >> return; 
> >> } 
> >> 
> >> 
> >> My Indexing 
> >> 
> >> public function indexTable($pf,$typeElement){ 
> >> 
> >> $params =array( 
> >> "index" =>'_river', 
> >> "type" => $typeElement, 
> >> "id" => "_meta", 
> >> "body" =>array( 
> >> 
> >> "type" => "jdbc", 
> >> "jdbc" => array( 
> >> "url" => "jdbc:mysql://ip/name", 
> >> "user" => 'root', 
> >> "password" => 'mdp', 
> >> "index" => $pf, 
> >> "type" => $typeElement, 
> >> "sql" => select id_source as _id, id_sous_theme, 
> >> titre_source, desc_source, note_source, adresse_source, type_source, 
> >> date_source from source, 
> >> "max_bulk_requests" => 5, 
> >> ) 
> >> ) 
> >> 
> >> ); 
> >> 
> >> 
> >> $this->elasticsearchClient->index($params); 
> >> } 
> >> 
> >> Thanks in advance. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to elasticsearc...@googlegroups.com . 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7daa716-cb5f-45cc-916b-43c7c0aea6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem indexing with my analyzer

2014-06-20 Thread Tanguy Bernard
Information
My "note_source" contain picture (.jpg, .png ...) in base64 and text.

For my mapping I have used :
"type" => "string"
"analyzer" => "reuteurs" (the name of my analyzer)


Any idea ?

Le jeudi 19 juin 2014 17:57:46 UTC+2, Tanguy Bernard a écrit :
>
> Hello
> I have some issue, when I index a particular data "note_source" (sql 
> longtext).
> I use the same analyzer for each fields (except date_source and id_source) 
> but for "note_source", I have a "warn monitor.jvm".
> When I remove "note_source", everything fine. If I don't use analyzer on 
> "note_source", everything fine, but if I use my analyzer on "note_source" I 
> have some crash.
>
> I think I have enough memory, I have used ES_HEAP_SIZE.
> Maybe my problem it's with accent (ascii, utf-8)
>
> Can you help me with this ?
>
>
>
> *My Setting*
>
>  public function createSetting($pf){
> $params = array('index' => $pf, 'body' => array(
> 'settings' => array(
> 'number_of_shards' => 5,
> 'number_of_replicas' => 0,
> 'analysis' => array(
> 'filter' => array(
> 'nGram' => array(
> "token_chars" =>array(),
> "type" => "nGram",
> "min_gram" => 3,
> "max_gram"  => 250
> )
> ),
> 'analyzer' => array(
> 'reuters' => array(
> 'type' => 'custom',
> 'tokenizer' => 'standard',
> 'filter' => array('lowercase', 'asciifolding', 
> 'nGram')
> )
> )
> )
> )
> ));
> $this->elasticsearchClient->indices()->create($params);
> return;
> }
>
>
> *My Indexing*
>
> public function indexTable($pf,$typeElement){
>
> $params =array(
> "index" =>'_river', 
> "type" => $typeElement, 
> "id" => "_meta", 
> "body" =>array(
>   
> "type" => "jdbc",
> "jdbc" => array(
> "url" => "jdbc:mysql://ip/name",
> "user" => 'root',
> "password" => 'mdp',
> "index" => $pf,
> "type" => $typeElement,
> "sql" => select id_source as _id, id_sous_theme, 
> titre_source, desc_source, note_source, adresse_source, type_source, 
> date_source from source,
> "max_bulk_requests" => 5,  
> )
> )
> 
> );
> 
>  
> $this->elasticsearchClient->index($params);
> }
>
> Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d93217c-bded-40fa-8fd2-fdac576c57ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem indexing with my analyzer

2014-06-19 Thread Tanguy Bernard
Hello
I have some issue, when I index a particular data "note_source" (sql 
longtext).
I use the same analyzer for each fields (except date_source and id_source) 
but for "note_source", I have a "warn monitor.jvm".
When I remove "note_source", everything fine. If I don't use analyzer on 
"note_source", everything fine, but if I use my analyzer on "note_source" I 
have some crash.

I think I have enough memory, I have used ES_HEAP_SIZE.
Maybe my problem it's with accent (ascii, utf-8)

Can you help me with this ?



*My Setting*

 public function createSetting($pf){
$params = array('index' => $pf, 'body' => array(
'settings' => array(
'number_of_shards' => 5,
'number_of_replicas' => 0,
'analysis' => array(
'filter' => array(
'nGram' => array(
"token_chars" =>array(),
"type" => "nGram",
"min_gram" => 3,
"max_gram"  => 250
)
),
'analyzer' => array(
'reuters' => array(
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => array('lowercase', 'asciifolding', 
'nGram')
)
)
)
)
));
$this->elasticsearchClient->indices()->create($params);
return;
}


*My Indexing*

public function indexTable($pf,$typeElement){
   
$params =array(
"index" =>'_river', 
"type" => $typeElement, 
"id" => "_meta", 
"body" =>array(
  
"type" => "jdbc",
"jdbc" => array(
"url" => "jdbc:mysql://ip/name",
"user" => 'root',
"password" => 'mdp',
"index" => $pf,
"type" => $typeElement,
"sql" => select id_source as _id, id_sous_theme, 
titre_source, desc_source, note_source, adresse_source, type_source, 
date_source from source,
"max_bulk_requests" => 5,  
)
)

);

 
$this->elasticsearchClient->index($params);
}

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd6e60dc-d394-4d7d-b994-2105002d7bd7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: how manage insert and update sql (river) ?

2014-05-23 Thread Tanguy Bernard
Tank you Jörg, with _id this is a perfect solution.

Le jeudi 22 mai 2014 11:54:18 UTC+2, Jörg Prante a écrit :
>
> If you use the column name _id, you can control the ID of the ES document 
> you created by SQL. If you do not use _id, a random doc ID is generated.
>
> See the README at https://github.com/jprante/elasticsearch-river-jdbc
>
> Jörg
>
>
> On Thu, May 22, 2014 at 11:43 AM, Tanguy Bernard 
> 
> > wrote:
>
>> Hello,
>> I would like to know  a way to manage INSERT and UPDATE ?
>> I am forced to delete and then re-index my data ?
>> Maybe there are a way to index again but without duplicate my data 
>> (INSERT) ?
>> Can you help with my problem ?
>>
>> I use this :
>>
>> PUT /_river/user/_meta
>> {
>> "type" : "jdbc",
>> "jdbc" : {
>>
>> "url" : "jdbc:mysql://my_adress/my_index",
>> "user" : "my_user",
>> "password" : "my_password",
>> "sql" : "select name_user, firstname_user, id_user from user",
>> "index" : "my_index",
>> "type" : "user",
>> "max_bulk_requests" : 5  
>>
>>
>> }
>> }
>>
>> Thanks in advance.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2ae75640-f6e4-4b1f-9216-b4e667e53171%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/2ae75640-f6e4-4b1f-9216-b4e667e53171%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac1201f4-38ac-4b9f-bdc5-582df667ba49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


how manage insert and update sql (river) ?

2014-05-22 Thread Tanguy Bernard
Hello,
I would like to know  a way to manage INSERT and UPDATE ?
I am forced to delete and then re-index my data ?
Maybe there are a way to index again but without duplicate my data (INSERT) 
?
Can you help with my problem ?

I use this :

PUT /_river/user/_meta
{
"type" : "jdbc",
"jdbc" : {

"url" : "jdbc:mysql://my_adress/my_index",
"user" : "my_user",
"password" : "my_password",
"sql" : "select name_user, firstname_user, id_user from user",
"index" : "my_index",
"type" : "user",
"max_bulk_requests" : 5  


}
}

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ae75640-f6e4-4b1f-9216-b4e667e53171%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: memory problem ( [WARN ][monitor.jvm] .... ) ?

2014-05-21 Thread Tanguy Bernard
I have try an other mapping, and everything fine, it's work.
Someone can explain me, why it's work when I remove "type_source" and 
"note_source" of my mapping ?

My new mapping (miss type_source, note_source)
PUT /my_index
{
  "mappings" : {
"source" : {
 "properties" : {

"id_source":{
"type":"string"
},
"adresse_source":{
"type":"string"
},

"titre_source" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"

},
"desc_source":{
"type" : "string",
  
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"


}
 }
}
  },

  "settings" : {
"analysis" : {
  "analyzer" : {
 "str_search_analyzer" : {

  "tokenizer" : "standard",
  "filter" : ["lowercase", "asciifolding"]
},

"str_index_analyzer" : {
  "tokenizer" : "standard",
  "filter" : ["lowercase", "substring","asciifolding"]
}
  },

  "filter" : {

"substring" : {
"token_chars" :[],
  "type" : "nGram",
  "min_gram" : 3,
  "max_gram"  : 250
} 
 
  
  
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3836b12-c063-4778-a666-3bd8ea1bf21c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: memory problem ( [WARN ][monitor.jvm] .... ) ?

2014-05-21 Thread Tanguy Bernard
I add about 100mo.
I try to set ES_HEAP_SIZE=8g and I have got an error java something like 
not enough memory


Le mercredi 21 mai 2014 10:55:22 UTC+2, Mark Walkom a écrit :
>
> How much heap are you running with, from what I can tell it's around a gig?
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 21 May 2014 18:27, Tanguy Bernard  >wrote:
>
>> Hello
>> I  try to index data in ElasticSearch, everything find until this [WARN 
>> ][monitor.jvm]  ?
>> This is the first time it happens to me.
>> Can you help me to solve my problem ?
>>
>>
>> PUT /my_index/
>> {
>>   "mappings" : {
>> "source" : {
>>  "properties" : {
>> 
>> "id_source":{
>> "type":"string"
>> },
>> "adresse_source":{
>> "type":"string",
>> "search_analyzer" : "str_search_analyzer",
>> "index_analyzer" : "str_index_analyzer"
>> },
>>  "type_source":{
>> "type":"string"
>> },
>> 
>> "titre_source" : {
>> "type" : "string",
>> "search_analyzer" : "str_search_analyzer",
>> "index_analyzer" : "str_index_analyzer"
>> 
>> },
>> "note_source":{
>> "type":"string",
>> "search_analyzer" : "str_search_analyzer",
>> "index_analyzer" : "str_index_analyzer"
>> },
>> "desc_source":{
>> "type" : "string",
>>   
>> "search_analyzer" : "str_search_analyzer",
>> "index_analyzer" : "str_index_analyzer"
>> 
>> 
>> }
>>  }
>> }
>>   },
>>
>>   "settings" : {
>> "analysis" : {
>>   "analyzer" : {
>>  "str_search_analyzer" : {
>> 
>>   "tokenizer" : "standard",
>>   "filter" : ["lowercase", "asciifolding"]
>> },
>>
>> "str_index_analyzer" : {
>>   "tokenizer" : "standard",
>>   "filter" : ["lowercase", "substring","asciifolding"]
>> }
>>   },
>>
>>   "filter" : {
>>
>> "substring" : {
>> "token_chars" :[],
>>   "type" : "nGram",
>>   "min_gram" : 3,
>>   "max_gram"  : 250
>> } 
>>  
>>   
>>   
>>   }
>> }
>>   }
>> }
>>
>> >>
>> {
>>"acknowledged": true
>> }
>>
>>
>> PUT /_river/source/_meta
>> {
>> "type" : "jdbc",
>> "jdbc" : {
>>
>> "url" : "jdbc:mysql:/192.168.50.62:9200/my_index",
>> "user" : "user",
>> "password" : "my-password",
>> "sql" : "select id_source, titre_source, desc_source, note_source, 
>> adresse_source, type_source from source",
>> "index" : "my_index",
>> "type" : "source",
>> "max_bulk_requests" : 5  
>>
>>
>> }
>> }
>>
>> >>
>>
>> [2014-05-21 10:21:27,134][INFO 
>> ][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
>> bulk [1] of [99 items], 1 outstanding bulk requests
>> [2014-05-21 10:21:28,813][INFO 
>> ][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
>> bulk [2] of [99 items], 2 outstanding bulk requests
>> [2014-05-21 10:21:31,518][WARN ][monitor.jvm  ] [ik-test1] 
>> [gc][young][179][4] duration [1.7s], collections [1]/[3.2s],

memory problem ( [WARN ][monitor.jvm] .... ) ?

2014-05-21 Thread Tanguy Bernard
Hello
I  try to index data in ElasticSearch, everything find until this [WARN 
][monitor.jvm]  ?
This is the first time it happens to me.
Can you help me to solve my problem ?


PUT /my_index/
{
  "mappings" : {
"source" : {
 "properties" : {

"id_source":{
"type":"string"
},
"adresse_source":{
"type":"string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
},
 "type_source":{
"type":"string"
},

"titre_source" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"

},
"note_source":{
"type":"string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"
},
"desc_source":{
"type" : "string",
  
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"


}
 }
}
  },

  "settings" : {
"analysis" : {
  "analyzer" : {
 "str_search_analyzer" : {

  "tokenizer" : "standard",
  "filter" : ["lowercase", "asciifolding"]
},

"str_index_analyzer" : {
  "tokenizer" : "standard",
  "filter" : ["lowercase", "substring","asciifolding"]
}
  },

  "filter" : {

"substring" : {
"token_chars" :[],
  "type" : "nGram",
  "min_gram" : 3,
  "max_gram"  : 250
} 
 
  
  
  }
}
  }
}

>>
{
   "acknowledged": true
}


PUT /_river/source/_meta
{
"type" : "jdbc",
"jdbc" : {

"url" : "jdbc:mysql:/192.168.50.62:9200/my_index",
"user" : "user",
"password" : "my-password",
"sql" : "select id_source, titre_source, desc_source, note_source, 
adresse_source, type_source from source",
"index" : "my_index",
"type" : "source",
"max_bulk_requests" : 5  


}
}

>>

[2014-05-21 10:21:27,134][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [1] of [99 items], 1 outstanding bulk requests
[2014-05-21 10:21:28,813][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [2] of [99 items], 2 outstanding bulk requests
[2014-05-21 10:21:31,518][WARN ][monitor.jvm  ] [ik-test1] 
[gc][young][179][4] duration [1.7s], collections [1]/[3.2s], total 
[1.7s]/[5.5s], memory [78.5mb]->[26.7mb]/[1015.6mb], all_pools {[young] 
[54.9mb]->[1.1mb]/[66.5mb]}{[survivor] [8.2mb]->[7.7mb]/[8.3mb]}{[old] 
[15.4mb]->[17.8mb]/[940.8mb]}
[2014-05-21 10:21:31,511][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [3] of [4 items], 3 outstanding bulk requests
[2014-05-21 10:21:31,844][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [4] of [99 items], 4 outstanding bulk requests
[2014-05-21 10:21:32,159][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [5] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:33,853][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[1] success [99 items] [6707ms]
[2014-05-21 10:21:33,853][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [6] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:36,575][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[2] success [99 items] [7762ms]
[2014-05-21 10:21:36,584][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [7] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:36,707][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[3] success [4 items] [5172ms]
[2014-05-21 10:21:36,886][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [8] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:37,993][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[4] success [99 items] [6149ms]
[2014-05-21 10:21:37,999][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [9] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:38,576][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[5] success [99 items] [6416ms]
[2014-05-21 10:21:38,579][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] new 
bulk [10] of [99 items], 5 outstanding bulk requests
[2014-05-21 10:21:39,451][INFO 
][org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverMouth] bulk 
[6] success [99 items] [5598ms]
[

Re: problem stop indexing data when I restart Elasticsearch ?

2014-05-15 Thread Tanguy Bernard
Thank you very much Jörg. It works perfectly.

Tanguy

Le jeudi 15 mai 2014 15:17:04 UTC+2, Jörg Prante a écrit :
>
> After you have indexed your data and all the work is done, you should 
> remove the river.
>
> curl -XDELETE '0:9200/_river/user/'
>
> Otherwise, the river will be automatically started again when the node 
> starts again.
>
> Jörg
>
>
> On Thu, May 15, 2014 at 2:59 PM, Tanguy Bernard 
> 
> > wrote:
>
>> Hello,
>> I indexing my data : everything fine, but when I restart Elasticsearch, 
>> he reindex me my data, My problem is that I have twice the same data.
>> Can you help me to solve this problem ?
>>
>> Thanks in advance.
>>
>> My code :
>>
>> PUT /my_index/_mapping/user
>> {
>>   "mappings" : {
>> "user" : {
>>  "properties" : {
>> 
>> "name_user":{
>> "type":"string"
>> }
>>
>>  }
>> }
>>   },
>>
>>   "settings" : {
>> "analysis" : {
>>   "analyzer" : {
>>  "str_search_analyzer" : {
>>
>>   "tokenizer" : "standard",
>>   "filter" : ["lowercase", "asciifolding"]
>> },
>>
>> "str_index_analyzer" : {
>>   "tokenizer" : "standard",
>>   "filter" : ["lowercase","asciifolding"]
>> }
>>   },
>>
>> 
>> }
>>   }
>> }
>>
>>
>> PUT /_river/user/_meta
>> {
>> "type" : "jdbc",
>> "jdbc" : {
>>
>> "url" : "my_url",
>> "user" : "user",
>> "password" : "password",
>> "sql" : "select name_user from user",
>> "index" : "my_index",
>> "type" : "user",
>> "max_bulk_requests" : 5  
>>
>>
>> }
>> }
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/c74975a9-71f1-4c40-a3de-b7907be2c401%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/c74975a9-71f1-4c40-a3de-b7907be2c401%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96290862-0d71-4d68-b4bc-f2fe73bb8ff6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: problem stop indexing data when I restart Elasticsearch ?

2014-05-15 Thread Tanguy Bernard
Thank you very much Jörg. It's work perfectly.

Tanguy 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6991c45a-e716-4106-9e00-01ebe849941a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


problem stop indexing data when I restart Elasticsearch ?

2014-05-15 Thread Tanguy Bernard
Hello,
I indexing my data : everything fine, but when I restart Elasticsearch, he 
reindex me my data, My problem is that I have twice the same data.
Can you help me to solve this problem ?

Thanks in advance.

My code :

PUT /my_index/_mapping/user
{
  "mappings" : {
"user" : {
 "properties" : {

"name_user":{
"type":"string"
}
   
 }
}
  },

  "settings" : {
"analysis" : {
  "analyzer" : {
 "str_search_analyzer" : {

  "tokenizer" : "standard",
  "filter" : ["lowercase", "asciifolding"]
},

"str_index_analyzer" : {
  "tokenizer" : "standard",
  "filter" : ["lowercase","asciifolding"]
}
  },


}
  }
}


PUT /_river/user/_meta
{
"type" : "jdbc",
"jdbc" : {

"url" : "my_url",
"user" : "user",
"password" : "password",
"sql" : "select name_user from user",
"index" : "my_index",
"type" : "user",
"max_bulk_requests" : 5  


}
}

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c74975a9-71f1-4c40-a3de-b7907be2c401%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: index and search pdf file with elasticsearch php client

2014-04-08 Thread Tanguy Bernard
I find the answer :

$params2 =array();

$params2['body']['query']['text']['file'] = 'my words';
$params2['body']['highlight']['fields']['file'] = array("term_vector" => 
"with_positions_offsets");
$results = $client->search($params2);
print_r($results);


Le mardi 8 avril 2014 10:22:21 UTC+2, Tanguy Bernard a écrit :
>
> Hello,
> Recently, I find a very helpfull information here :
> https://gist.github.com/lukas-vlcek/1075067
>
> I would like to reproduce the same indexing and searching with php 
> ElasticSearch client.
> My indexing seems to work!
>
>  require_once 'vendor/autoload.php';
> $client = new Elasticsearch\Client();
>
> $doc_src = "fn6742.pdf";
> $binary = fread(fopen($doc_src, "r"), filesize($doc_src));
> $doc_str = base64_encode($binary);
>
>
> $article = array(); 
> $article['index'] = 'index2';
> $article['type']  = 'attachment';
> $article['body']  = array('file' => $doc_str);
>
> $result = $client->index($article);
>
> ?>
>
>
> But my "search" does not work. I would like to find the sentence where my 
> world is.
> I tried this :
>
> $params2['body']['query']['match']['file'] = 'my word';
> $results = $client->search($params2);
> print_r($results);
>
> And I would like something like this :"file" : [ " It'smy word 
> /You can't use my word / because " ]
>
>
> I hope you can help me?
>
> Thanks in advance
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3969e1e5-6a19-461d-87d3-3f5c8fa021fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


index and search pdf file with elasticsearch php client

2014-04-08 Thread Tanguy Bernard
Hello,
Recently, I find a very helpfull information here :
https://gist.github.com/lukas-vlcek/1075067

I would like to reproduce the same indexing and searching with php 
ElasticSearch client.
My indexing seems to work!

 $doc_str);

$result = $client->index($article);

?>


But my "search" does not work. I would like to find the sentence where my 
world is.
I tried this :

$params2['body']['query']['match']['file'] = 'my word';
$results = $client->search($params2);
print_r($results);

And I would like something like this :"file" : [ " It'smy word 
/You can't use my word / because " ]


I hope you can help me?

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9eb30ab1-119d-4cb6-a502-5402986f5cfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.