Re: InputFormat and InputSplit - Network location name contains /:
Hi Harsh, Many thanks! I got rid of the problem by updating the InputSplit's getLocations() to return hosts. Patcharee On 04/11/2014 06:16 AM, Harsh J wrote: Do not use the InputSplit's getLocations() API to supply your file path, it is not intended for such things, if thats what you've done in your current InputFormat implementation. If you're looking to store a single file path, use the FileSplit class, or if not as simple as that, do use it as a base reference to build you Path based InputSplit derivative. Its sources are at https://github.com/apache/hadoop-common/blob/release-2.4.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileSplit.java. Look for the Writable method overrides in particular to understand how to use custom fields. On Thu, Apr 10, 2014 at 9:54 PM, Patcharee Thongtra wrote: Hi, I wrote a custom InputFormat and InputSplit to handle netcdf file. I use with a custom pig Load function. When I submitted a job by running a pig script. I got an error below. From the error log, the network location name is "hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02" - my input file, containing "/", and hadoop does not allow. It could be something missing in my custom InputFormat and InputSplit. Any ideas? Any help is appreciated, Patcharee 2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP 2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1387474594811_0071Job Transitioned from SETUP to RUNNING 2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to /default-rack 2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Network location name contains /: hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87) at org.apache.hadoop.net.NodeBase.(NodeBase.java:65) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.(TaskAttemptImpl.java:548) at org.apache.hadoop.mapred.MapTaskAttemptImpl.(MapTaskAttemptImpl.java:47) at org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler] org.apache.hadoop.
Re: InputFormat and InputSplit - Network location name contains /:
Do not use the InputSplit's getLocations() API to supply your file path, it is not intended for such things, if thats what you've done in your current InputFormat implementation. If you're looking to store a single file path, use the FileSplit class, or if not as simple as that, do use it as a base reference to build you Path based InputSplit derivative. Its sources are at https://github.com/apache/hadoop-common/blob/release-2.4.0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/FileSplit.java. Look for the Writable method overrides in particular to understand how to use custom fields. On Thu, Apr 10, 2014 at 9:54 PM, Patcharee Thongtra wrote: > Hi, > > I wrote a custom InputFormat and InputSplit to handle netcdf file. I use > with a custom pig Load function. When I submitted a job by running a pig > script. I got an error below. From the error log, the network location name > is "hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02" - > my input file, containing "/", and hadoop does not allow. > > It could be something missing in my custom InputFormat and InputSplit. Any > ideas? Any help is appreciated, > > Patcharee > > > 2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0] > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing > the event EventType: JOB_SETUP > > 2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1387474594811_0071Job Transitioned from SETUP to RUNNING > > 2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.util.RackResolver: Resolved > hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to > /default-rack > > 2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread > java.lang.IllegalArgumentException: Network location name contains /: > hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 > at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87) > at org.apache.hadoop.net.NodeBase.(NodeBase.java:65) > at > org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111) > at > org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.(TaskAttemptImpl.java:548) > at > org.apache.hadoop.mapred.MapTaskAttemptImpl.(MapTaskAttemptImpl.java:47) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) > at java.lang.Thread.run(Thread.java:662) > 2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler] > org.apache.hadoop. -- Harsh J
InputFormat and InputSplit - Network location name contains /:
Hi, I wrote a custom InputFormat and InputSplit to handle netcdf file. I use with a custom pig Load function. When I submitted a job by running a pig script. I got an error below. From the error log, the network location name is "hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02" - my input file, containing "/", and hadoop does not allow. It could be something missing in my custom InputFormat and InputSplit. Any ideas? Any help is appreciated, Patcharee 2014-04-10 17:09:01,854 INFO [CommitterEvent Processor #0] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_SETUP 2014-04-10 17:09:01,918 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1387474594811_0071Job Transitioned from SETUP to RUNNING 2014-04-10 17:09:01,982 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 to /default-rack 2014-04-10 17:09:01,984 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.IllegalArgumentException: Network location name contains /: hdfs://service-1-0.local:8020/user/patcharee/netcdf_data/wrfout_d02 at org.apache.hadoop.net.NodeBase.set(NodeBase.java:87) at org.apache.hadoop.net.NodeBase.(NodeBase.java:65) at org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:111) at org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:95) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.(TaskAttemptImpl.java:548) at org.apache.hadoop.mapred.MapTaskAttemptImpl.(MapTaskAttemptImpl.java:47) at org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl.createAttempt(MapTaskImpl.java:62) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAttempt(TaskImpl.java:594) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.addAndScheduleAttempt(TaskImpl.java:581) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.access$1300(TaskImpl.java:100) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:871) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl$InitialScheduleTransition.transition(TaskImpl.java:866) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:632) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl.handle(TaskImpl.java:99) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1237) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher.handle(MRAppMaster.java:1231) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81) at java.lang.Thread.run(Thread.java:662) 2014-04-10 17:09:01,986 INFO [AsyncDispatcher event handler] org.apache.hadoop.