Re: unable to write data to elasticsearch using hadoop PIG

hanine haninne Mon, 12 May 2014 15:21:13 -0700

thank you so much for your quick reply,
Here is what I had done
1-instaled hadoop-1.2.1( pig-0.12.0 / hive-0.11.0 /...)
2-download Elasticsearch-1.0.1 and put it in the same file of hadoop
3-copied  the following 4 elasticsearch-hadoop jar files
elasticsearch-hadoop-1.3.0.M2.jar          
elasticsearch-hadoop-1.3.0.M2-sources.jar
elasticsearch-hadoop-1.3.0.M2-javadoc.jar  
elasticsearch-hadoop-1.3.0.M2-yarn.jar
to /pig and hadoop/lib
4- Add them in the PIG_CLASSPATH


knowing that when I take data from my Desktop and put it in elasticsearch 
using pig script it works very well, but when I try to get data from my 
HDFS it gives me that :

2014-05-12 23:16:31,765 [main] ERROR 
org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.io.IOException: 
Out of nodes and retries; caught exception
2014-05-12 23:16:31,765 [main] ERROR 
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-05-12 23:16:31,766 [main] INFO  
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion    PigVersion    UserId    StartedAt    FinishedAt    Features
1.2.1    0.12.0    hduser    2014-05-12 23:15:34    2014-05-12 23:16:31    
GROUP_BY

Failed!

Failed Jobs:
JobId    Alias    Feature    Message    Outputs
job_201405122310_0001    weblog_count,weblog_group,weblogs    
GROUP_BY,COMBINER    Message: Job failed! Error - # of failed Reduce Tasks 
exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201405122310_0001_r_000000    weblogs1/logs2,

Input(s):
Failed to read data from "/user/weblogs"

Output(s):
Failed to produce result in "weblogs1/logs2"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201405122310_0001


2014-05-12 23:16:31,766 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!


And here is the script :

weblogs = LOAD '/user/weblogs' USING PigStorage('\t')
AS (client_ip : chararray,
full_request_date : chararray,
day : int,
month : chararray,
month_num : int,
year : int,
hour : int,
minute : int,
second : int,
timezone : chararray,
http_verb : chararray,
uri : chararray,
http_status_code : chararray,
bytes_returned : chararray,
referrer : chararray,
user_agent : chararray
);
weblog_group = GROUP weblogs by (client_ip, year, month_num);
weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, 
group.month_num, COUNT_STAR(weblogs) as pageviews;
STORE weblog_count INTO 'weblogs1/logs2' USING 
org.elasticsearch.hadoop.pig.EsStorage();


Le lundi 12 mai 2014 16:28:20 UTC+1, Costin Leau a écrit :
>
> Check your network settings and make sure that the Hadoop nodes can 
> communicate with the ES nodes. 
> If you install ES besides Hadoop itself, this shouldn't be a problem. 
> There are various way to check this - try ping, tracert, etc... 
>
> Please refer to your distro manual/documentation for more information 
> about the configuration and setup. 
>
> Cheers, 
>
> On 5/12/14 3:42 PM, hanine haninne wrote: 
> > I had get the same erreur but I don't know what I have to change in my 
> "/etc/hosts" 
> > thank you for your help 
> > 
> > Le mercredi 5 mars 2014 09:39:46 UTC, Yann Barraud a écrit : 
> > 
> >     Hi, 
> > 
> >     Is your ES instance known by your Hadoop cluster (/etc/hosts) ? 
> > 
> >     It does not even seems to read in it. 
> > 
> >     Cheers, 
> >     Yann 
> > 
> >     Le mercredi 5 mars 2014 06:32:55 UTC+1, siva mannem a écrit : 
> > 
> >         I installed ES(at the location /usr/lib/elasticsearch/) on our 
> gateway server and i am able to run some basic 
> >         curl commands like XPUT and XGET to create some indices and 
> retrieve the data in them. 
> >         i am able to give single line JSON record but i am unable to 
> give JSON file as input to curl XPUT . 
> >         can anybody give me the syntax for giving JSON file as input for 
> curl XPUT command? 
> > 
> >         my next issue is i copied  the following 4 elasticsearch-hadoop 
> jar files 
> >         elasticsearch-hadoop-1.3.0.M2.jar 
> >         elasticsearch-hadoop-1.3.0.M2-sources.jar 
> >         elasticsearch-hadoop-1.3.0.M2-javadoc.jar 
> >         elasticsearch-hadoop-1.3.0.M2-yarn.jar 
> > 
> >         to  /usr/lib/elasticsearch/elasticsearch-0.90.9/lib 
> >         and /usr/lib/gphd/pig/ 
> > 
> >         i have the following json file j.json 
> >         ++++++ 
> >         {"k1":"v1" ,  "k2":"v2" , "k3":"v3"} 
> >         ++++++++ 
> > 
> >         in my_hdfs_path. 
> > 
> >         my pig script is write_data_to_es.pig 
> >         +++++++++++++ 
> >         REGISTER 
> /usr/lib/gphd/pig/elasticsearch-hadoop-1.3.0.M2-yarn.jar; 
> >         DEFINE ESTOR 
> org.elasticsearch.hadoop.pig.EsStorage('es.resource=usa/ca'); 
> >         A = LOAD '/my_hdfs_path/j.json' using 
> JsonLoader('k1:chararray,k2:chararray,k3:chararray'); 
> >         STORE A into 'usa/ca' USING ESTOR('es.input.json=true'); 
> >         ++++++++++++++ 
> > 
> >         when i run my pig script 
> >         +++++++++ 
> >         pig -x mapreduce  write_data_to_es.pig 
> >         ++++++++++++ 
> > 
> >         i am getting following error 
> >         +++++++++ 
> >         Input(s): 
> >         Failed to read data from "/my_hdfs_path/j.json" 
> > 
> >         Output(s): 
> >         Failed to produce result in "usa/ca" 
> > 
> >         Counters: 
> >         Total records written : 0 
> >         Total bytes written : 0 
> >         Spillable Memory Manager spill count : 0 
> >         Total bags proactively spilled: 0 
> >         Total records proactively spilled: 0 
> > 
> >         Job DAG: 
> >         job_1390436301987_0089 
> > 
> > 
> >         2014-03-05 00:26:50,839 [main] INFO 
> >           
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  
> - Failed! 
> >         2014-03-05 00:26:50,841 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Input(s): 
> >         Failed to read data from "/elastic_search/es_hadoop_test.json" 
> > 
> >         Output(s): 
> >         Failed to produce result in "mannem/siva" 
> > 
> >         Counters: 
> >         Total records written : 0 
> >         Total bytes written : 0 
> >         Spillable Memory Manager spill count : 0 
> >         Total bags proactively spilled: 0 
> >         Total records proactively spilled: 0 
> > 
> >         Job DAG: 
> >         job_1390436301987_0089 
> > 
> >         2014-03-05 00:26:50,839 [main] INFO 
> >           
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  
> - Failed! 
> >         2014-03-05 00:26:50,841 [main] ERROR 
> org.apache.pig.tools.grunt.GruntParser - *ERROR 2997: Encountered 
> >         IOException. Out of nodes and retries; caught exception* 
> >         Details at logfile: 
> /usr/lib/elasticsearch/elasticsearch-0.90.9/pig_1393997175206.log 
> >         ++++++++++++ 
> > 
> >         i am using pivotal hadoop version (1.0.1)  which is basically 
> apache hadoop (hadoop-2.0.2) 
> >         and pig version is 0.10.1 
> >         and elastic search version is 0.90.9 
> > 
> >         can anybody help me out here? 
> >         thank you so much in advance for your help. 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "elasticsearch" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to 
> > elasticsearc...@googlegroups.com <javascript:> <mailto:
> elasticsearch+unsubscr...@googlegroups.com <javascript:>>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com
>  
> > <
> https://groups.google.com/d/msgid/elasticsearch/1dd8ff7d-ef53-4614-9300-13b5f6ed66fa%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  
>
> > For more options, visit https://groups.google.com/d/optout. 
>
> -- 
> Costin 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd9d3143-556a-43c8-9cfd-78b666db48b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: unable to write data to elasticsearch using hadoop PIG

Reply via email to