Spark Druid Ingestion

2018-03-22 Thread nayan sharma
Hi All,As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for ingesting data into druid taking reference from https://github.com/metamx/druid-spark-batchBut we are stuck at the following error.Application Log:—>2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Will allocate AM container, with 896 MB memory including 384 MB overhead2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Setting up container launch context for our AM
2018-03-20T07:54:28,785 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Setting up the launch environment for our AM container
2018-03-20T07:54:28,793 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Preparing resources for our AM container
2018-03-20T07:54:29,364 WARN [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-03-20T07:54:29,371 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Uploading resource file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_libs__8247917347016008883.zip -> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
2018-03-20T07:54:29,607 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Uploading resource file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_conf__2240950972346324291.zip -> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip
2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] org.apache.spark.SecurityManager - Changing view acls to: yarn
2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] org.apache.spark.SecurityManager - Changing modify acls to: yarn
2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] org.apache.spark.SecurityManager - Changing view acls groups to: 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] org.apache.spark.SecurityManager - Changing modify acls groups to: 
2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] org.apache.spark.SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn); groups with view permissions: Set(); users  with modify permissions: Set(yarn); groups with modify permissions: Set()
2018-03-20T07:54:29,679 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Submitting application application_1521457397747_0013 to ResourceManager
2018-03-20T07:54:29,709 INFO [task-runner-0-priority-0] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1521457397747_0013
2018-03-20T07:54:29,713 INFO [task-runner-0-priority-0] org.apache.spark.scheduler.cluster.SchedulerExtensionServices - Starting Yarn extension services with app application_1521457397747_0013 and attemptId None
2018-03-20T07:54:30,722 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - Application report for application_1521457397747_0013 (state: FAILED)
2018-03-20T07:54:30,729 INFO [task-runner-0-priority-0] org.apache.spark.deploy.yarn.Client - 
	 client token: N/A
	 diagnostics: Application application_1521457397747_0013 failed 2 times due to AM Container for appattempt_1521457397747_0013_02 exited with  exitCode: -1000
For more detailed output, check the application tracking page: http://n-pa-hdn220.xxx.:8088/cluster/app/application_1521457397747_0013 Then click on links to logs of each attempt.
Diagnostics: No such file or directory
ENOENT: No such file or directory
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:756)
	at org.apache.hadoop.fs.DelegateToFileSystem.setPermission(DelegateToFileSystem.java:211)
	at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java:252)
	at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:1003)
	at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:999)
	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
	at org.apache.hadoop.fs.FileContext.setPermission(FileContext.java:1006)
	at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:421)
	at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:419)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hadoop.yarn.util.FSDownload.changePermissions(FSDownload.java:419)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:365)
	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:6

Re: Spark Druid Ingestion

2018-03-22 Thread Jorge Machado
Seems to me permissions problems  ! Can you check your user / folder 
permissions ? 

Jorge Machado





> On 22 Mar 2018, at 08:21, nayan sharma  wrote:
> 
> Hi All,
> As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for 
> ingesting data into druid taking reference from 
> https://github.com/metamx/druid-spark-batch 
> 
> But we are stuck at the following error.
> Application Log:—>
> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Will allocate AM container, with 896 MB 
> memory including 384 MB overhead
> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Setting up container launch context for 
> our AM
> 2018-03-20T07:54:28,785 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Setting up the launch environment for 
> our AM container
> 2018-03-20T07:54:28,793 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Preparing resources for our AM container
> 2018-03-20T07:54:29,364 WARN [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-03-20T07:54:29,371 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Uploading resource 
> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_libs__8247917347016008883.zip
>  -> 
> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
>  
> 
> 2018-03-20T07:54:29,607 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Uploading resource 
> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_conf__2240950972346324291.zip
>  -> 
> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip
>  
> 
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing view acls to: yarn
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing modify acls to: yarn
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing view acls groups to: 
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing modify acls groups to: 
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - SecurityManager: authentication disabled; 
> ui acls disabled; users  with view permissions: Set(yarn); groups with view 
> permissions: Set(); users  with modify permissions: Set(yarn); groups with 
> modify permissions: Set()
> 2018-03-20T07:54:29,679 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Submitting application 
> application_1521457397747_0013 to ResourceManager
> 2018-03-20T07:54:29,709 INFO [task-runner-0-priority-0] 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
> application_1521457397747_0013
> 2018-03-20T07:54:29,713 INFO [task-runner-0-priority-0] 
> org.apache.spark.scheduler.cluster.SchedulerExtensionServices - Starting Yarn 
> extension services with app application_1521457397747_0013 and attemptId None
> 2018-03-20T07:54:30,722 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Application report for 
> application_1521457397747_0013 (state: FAILED)
> 2018-03-20T07:54:30,729 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - 
>client token: N/A
>diagnostics: Application application_1521457397747_0013 failed 2 times 
> due to AM Container for appattempt_1521457397747_0013_02 exited with  
> exitCode: -1000
> For more detailed output, check the application tracking page: 
> http://n-pa-hdn220.xxx.:8088/cluster/app/application_1521457397747_0013 
>  
> Then click on links to logs of each attempt.
> Diagnostics: No such file or directory
> ENOENT: No such file or directory
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:756)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.setPermission(DelegateToFileSystem.java:211)
>   at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java:252)
>   at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:1003)
>   at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:999)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.setPermission(FileContext.java:1006)
>

Re: Spark Druid Ingestion

2018-03-22 Thread nayan sharma
Hey Jorge,

Thanks for responding.

Can you elaborate on the user permission part ? HDFS or local ?

As of now, hdfs path -> 
hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
 

 already has complete access for yarn user and my job is also running from the 
same user.


Thanks,
Nayan


> On Mar 22, 2018, at 12:54 PM, Jorge Machado  wrote:
> 
> Seems to me permissions problems  ! Can you check your user / folder 
> permissions ? 
> 
> Jorge Machado
> 
> 
> 
> 
> 
>> On 22 Mar 2018, at 08:21, nayan sharma > > wrote:
>> 
>> Hi All,
>> As druid uses Hadoop MapReduce to ingest batch data but I am trying spark 
>> for ingesting data into druid taking reference from 
>> https://github.com/metamx/druid-spark-batch 
>> 
>> But we are stuck at the following error.
>> Application Log:—>
>> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Will allocate AM container, with 896 
>> MB memory including 384 MB overhead
>> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Setting up container launch context 
>> for our AM
>> 2018-03-20T07:54:28,785 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Setting up the launch environment for 
>> our AM container
>> 2018-03-20T07:54:28,793 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Preparing resources for our AM 
>> container
>> 2018-03-20T07:54:29,364 WARN [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Neither spark.yarn.jars nor 
>> spark.yarn.archive is set, falling back to uploading libraries under 
>> SPARK_HOME.
>> 2018-03-20T07:54:29,371 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Uploading resource 
>> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_libs__8247917347016008883.zip
>>  -> 
>> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
>>  
>> 
>> 2018-03-20T07:54:29,607 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Uploading resource 
>> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_conf__2240950972346324291.zip
>>  -> 
>> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip
>>  
>> 
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing view acls to: yarn
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing modify acls to: yarn
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing view acls groups to: 
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - Changing modify acls groups to: 
>> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
>> org.apache.spark.SecurityManager - SecurityManager: authentication disabled; 
>> ui acls disabled; users  with view permissions: Set(yarn); groups with view 
>> permissions: Set(); users  with modify permissions: Set(yarn); groups with 
>> modify permissions: Set()
>> 2018-03-20T07:54:29,679 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Submitting application 
>> application_1521457397747_0013 to ResourceManager
>> 2018-03-20T07:54:29,709 INFO [task-runner-0-priority-0] 
>> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted 
>> application application_1521457397747_0013
>> 2018-03-20T07:54:29,713 INFO [task-runner-0-priority-0] 
>> org.apache.spark.scheduler.cluster.SchedulerExtensionServices - Starting 
>> Yarn extension services with app application_1521457397747_0013 and 
>> attemptId None
>> 2018-03-20T07:54:30,722 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - Application report for 
>> application_1521457397747_0013 (state: FAILED)
>> 2018-03-20T07:54:30,729 INFO [task-runner-0-priority-0] 
>> org.apache.spark.deploy.yarn.Client - 
>>   client token: N/A
>>   diagnostics: Application application_1521457397747_0013 failed 2 times 
>> due to AM Container for appattempt_1521457397747_0013_02 exited with  
>> exitCode: -1000
>> For more detailed output, check the application tracking page: 
>> http://n-pa-hdn220.xxx.:8088/cluster/app/application_1521457397747_0013 
>> 
>>  Then click on links to logs of each attempt.
>> Diagnostics: No such file or directory
>> ENOENT: No such file or directory
>>  at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
>>  at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
>>  at 
>> org.apache

Re: Spark Druid Ingestion

2018-07-02 Thread gosoy
Hi Nayan,

Were you able to resolve this issue? Is it because of some file/folder
permission problems?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org