Re: Creating Dataframe by querying Impala

2017-06-01 Thread Anubhav Agarwal
The issue seems to be with primordial class loader. I cannot load the drivers to all the nodes at the same location but have loaded the jars to HDFS. I have tried SPARK_YARN_DIST_FILES as well as SPARK_CLASSPATH on the edge node with no luck. Is there another way to load these jars through

Re: removing columns from file

2017-04-28 Thread Anubhav Agarwal
Are you using Spark's textFiles method? If so, go through this blog :- http://tech.kinja.com/how-not-to-pull-from-s3-using-apache-spark-1704509219 Anubhav On Mon, Apr 24, 2017 at 12:48 PM, Afshin, Bardia < bardia.afs...@capitalone.com> wrote: > Hi there, > > > > I have a process that downloads

SLF4J binding error while running Spark using YARN as Cluster Manager

2016-05-18 Thread Anubhav Agarwal
Hi, I am having log4j trouble while running Spark using YARN as cluster manager in CDH 5.3.3. I get the following error:- SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in

Re: Improve parquet write speed to HDFS and spark.sql.execution.id is already set ERROR

2015-11-03 Thread Anubhav Agarwal
. On Tue, Nov 3, 2015 at 7:48 AM, Ted Yu <yuzhih...@gmail.com> wrote: > I am a bit curious: why is the synchronization on finalLock is needed ? > > Thanks > > On Oct 23, 2015, at 8:25 AM, Anubhav Agarwal <anubha...@gmail.com> wrote: > > I have a spark job that

Improve parquet write speed to HDFS and spark.sql.execution.id is already set ERROR

2015-10-23 Thread Anubhav Agarwal
I have a spark job that creates 6 million rows in RDDs. I convert the RDD into Data-frame and write it to HDFS. Currently it takes 3 minutes to write it to HDFS. Here is the snippet:- RDDList.parallelStream().forEach(mapJavaRDD -> { if (mapJavaRDD != null) {

Re: [jira] Ankit shared "SPARK-11213: Documentation for remote spark Submit for R Scripts from 1.5 on CDH 5.4" with you

2015-10-22 Thread Anubhav Agarwal
Hi Ankit, Here is my solution for this:- 1) Download the latest Spark 1.5.1(Just copied the following link from spark.apache.org, if it doesn't work then gran a new one from the website.) wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz 2) Unzip the folder and rename/move

Application failed error

2015-08-11 Thread Anubhav Agarwal
I am running Spark 1.3 on CDH 5.4 stack. I am getting the following error when I spark-submit my application:- 15/08/11 16:03:49 INFO Remoting: Starting remoting 15/08/11 16:03:49 INFO Remoting: Remoting started; listening on addresses

NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
Hi, I am trying to modify my code to use HDFS and multiple nodes. The code works fine when I run it locally in a single machine with a single worker. I have been trying to modify it and I get the following error. Any hint would be helpful. java.lang.NullPointerException at

Re: NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
do you use ? Cheers On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal anubha...@gmail.com wrote: Hi, I am trying to modify my code to use HDFS and multiple nodes. The code works fine when I run it locally in a single machine with a single worker. I have been trying to modify it and I get

Re: Spark-thriftserver Issue

2015-03-24 Thread Anubhav Agarwal
Zhan specifying port fixed the port issue. Is it possible to specify the log directory while starting the spark thriftserver? Still getting this error even through the folder exists and everyone has permission to use that directory. drwxr-xr-x 2 root root 4096 Mar 24 19:04