Re: Spark Druid Ingestion

nayan sharma Thu, 22 Mar 2018 00:38:33 -0700

Hey Jorge,

Thanks for responding.


Can you elaborate on the user permission part ? HDFS or local ?

As of now, hdfs path -> 
hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
 
<hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip>
 already has complete access for yarn user and my job is also running from the 
same user.


Thanks,
Nayan


> On Mar 22, 2018, at 12:54 PM, Jorge Machado <jom...@me.com> wrote:
> 
> Seems to me permissions problems  ! Can you check your user / folder 
> permissions ? 
> 
> Jorge Machado
> 
> 
> 
> 
> 
>> On 22 Mar 2018, at 08:21, nayan sharma <nayansharm...@gmail.com 
>> <mailto:nayansharm...@gmail.com>> wrote:
>> 
>> Hi All,
>> As druid uses Hadoop MapReduce to ingest batch data but I am trying spark 
>> for ingesting data into druid taking reference from 
>> https://github.com/metamx/druid-spark-batch 
>> <https://github.com/metamx/druid-spark-batch>
>> But we are stuck at the following error.
>> Application Log:—>
>> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Will allocate AM container, with 896 
>> MB memory including 384 MB overhead
>> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Setting up container launch context 
>> for our AM
>> 2018-03-20T07:54:28,785 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Setting up the launch environment for 
>> our AM container
>> 2018-03-20T07:54:28,793 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Preparing resources for our AM 
>> container
>> 2018-03-20T07:54:29,364 WARN [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Neither spark.yarn.jars nor 
>> spark.yarn.archive is set, falling back to uploading libraries under 
>> SPARK_HOME.
>> 2018-03-20T07:54:29,371 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Uploading resource 
>> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_libs__8247917347016008883.zip
>>  -> 
>> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
>>  
>> <hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip>
>> 2018-03-20T07:54:29,607 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Uploading resource 
>> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_conf__2240950972346324291.zip
>>  -> 
>> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip
>>  
>> <hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip>
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing view acls to: yarn
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing modify acls to: yarn
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing view acls groups to: 
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing modify acls groups to: 
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - SecurityManager: authentication disabled; 
>> ui acls disabled; users  with view permissions: Set(yarn); groups with view 
>> permissions: Set(); users  with modify permissions: Set(yarn); groups with 
>> modify permissions: Set()
>> 2018-03-20T07:54:29,679 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Submitting application 
>> application_1521457397747_0013 to ResourceManager
>> 2018-03-20T07:54:29,709 INFO [task-runner-0-priority-0] 
>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
>> application application_1521457397747_0013
>> 2018-03-20T07:54:29,713 INFO [task-runner-0-priority-0] 
>> org.apache.spark.scheduler.cluster.SchedulerExtensionServices - Starting 
>> Yarn extension services with app application_1521457397747_0013 and 
>> attemptId None
>> 2018-03-20T07:54:30,722 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Application report for 
>> application_1521457397747_0013 (state: FAILED)
>> 2018-03-20T07:54:30,729 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - 
>>       client token: N/A
>>       diagnostics: Application application_1521457397747_0013 failed 2 times 
>> due to AM Container for appattempt_1521457397747_0013_000002 exited with  
>> exitCode: -1000
>> For more detailed output, check the application tracking page: 
>> http://n-pa-hdn220.xxx.xxxx:8088/cluster/app/application_1521457397747_0013 
>> <http://n-pa-hdn220.xxx.xxxx:8088/cluster/app/application_1521457397747_0013>
>>  Then click on links to logs of each attempt.
>> Diagnostics: No such file or directory
>> ENOENT: No such file or directory
>>      at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
>>      at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
>>      at 
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:756)
>>      at 
>> org.apache.hadoop.fs.DelegateToFileSystem.setPermission(DelegateToFileSystem.java:211)
>>      at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java:252)
>>      at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:1003)
>>      at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:999)
>>      at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>>      at org.apache.hadoop.fs.FileContext.setPermission(FileContext.java:1006)
>>      at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:421)
>>      at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:419)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:422)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>>      at 
>> org.apache.hadoop.yarn.util.FSDownload.changePermissions(FSDownload.java:419)
>>      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:365)
>>      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>      at java.lang.Thread.run(Thread.java:748)
>> 
>> 
>> As far as I can understand there is something wrong with the job submission 
>> through Yarn.
>> 
>> On local machine it is running but HDP cluster it is giving error.
>> 
>> 
>> <yarnlogs.txt>
>> 
>> Thanks,
>> Nayan
>> 
>> 
>> 
>> 
>

Re: Spark Druid Ingestion

Reply via email to