Hi,
Do we have any way to perform Row level operations in spark dataframes.
For example,
I have a dataframe with columns from A,B,C,...Z.. I want to add one more column
New Column with sum of all column values.
A
B
C
D
.
.
.
Z
New Column
1
2
4
3
26
351
Can somebody
read.java:745)
Thanks,
Naveen Kumar Pokala
[cid:image001.jpg@01D19B26.32EE0FE0]
Hi,
I am using spark 1.6.0
I want to find standard deviation of columns that will come dynamically.
val stdDevOnAll = columnNames.map { x => stddev(x }
causalDf.groupBy(causalDf("A"),causalDf("B"),causalDf("C"))
.agg(stdDevOnAll:_*) //error line
I am trying to do as above.
But it
Hi,
I am facing the following issue when I am connecting from spark-shell. Please
tell me how to avoid it.
15/01/29 17:21:27 ERROR Shell: Failed to locate the winutils binary in the
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the
Hadoop
Hi,
Anybody tried to connect to spark cluster( on UNIX machines) from windows
interactive shell ?
-Naveen.
Error from python worker:
python: module pyspark.daemon not found
PYTHONPATH was:
/home/npokala/data/spark-install/spark-master/python:
Please can somebody help me on this, how to resolve the issue.
-Naveen
From: Naveen Kumar Pokala [mailto:npok...@spcapitaliq.com]
Sent: Wednesday, December 31, 2014 2:28 PM
To: user@spark.apache.org
Subject: pyspark.daemon not found
Error from python worker:
python: module pyspark.daemon not found
PYTHONPATH was:
/home/npokala/data/spark-install/spark-master
14/12/29 18:10:56 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2,
nj09mhf0730.mhf.mhc, PROCESS_LOCAL, 1246 bytes)
14/12/29 18:10:56 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on
executor nj09mhf0730.mhf.mhc: org.apache.spark.SparkException (
Error from python worker:
Hi.
Is there a way to submit spark job on Hadoop-YARN cluster from java code.
-Naveen
Hi,
While submitting your spark job mention --executor-cores 2 --num-executors 24
it will divide the dataset into 24*2 parquet files.
Or set spark.default.parallelism value like 50 on sparkconf object. It will
divide the dataset into 50 files into your HDFS.
-Naveen
-Original
Hi,
I want to submit my spark program from my machine on a YARN Cluster in yarn
client mode.
How to specify al l the required details through SPARK submitter.
Please provide me some details.
-Naveen.
, 2014 4:08 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Submit Spark driver on Yarn Cluster in client mode
You can export the hadoop configurations dir (export HADOOP_CONF_DIR=XXX) in
the environment and then submit it like:
./bin/spark-submit
Thanks Akhil.
-Naveen.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, November 18, 2014 1:19 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.org
Subject: Re: Null pointer exception with larger datasets
Make sure your list is not null, if that is null then its more like
Hi,
JavaRDDInstrument studentsData = sc.parallelize(list);--list is Student Info
ListStudent
studentsData.saveAsTextFile(hdfs://master/data/spark/instruments.txt);
above statements saved the students information in the HDFS as a text file.
Each object is a line in text file as below.
Hi,
I am having list Students and size is one Lakh and I am trying to save the
file. It is throwing null pointer exception.
JavaRDDStudent distData = sc.parallelize(list);
distData.saveAsTextFile(hdfs://master/data/spark/instruments.txt);
14/11/18 01:33:21 WARN scheduler.TaskSetManager: Lost
Hi,
I am receiving following error when I am trying to run sample spark program.
Caused by: java.lang.UnsatisfiedLinkError:
)
case class Instrument(issue: Issue = null)
-Naveen
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Wednesday, November 12, 2014 12:09 AM
To: Xiangrui Meng
Cc: Naveen Kumar Pokala; user@spark.apache.org
Subject: Re: scala.MatchError
Xiangrui is correct that is must be a java bean
[cid:image001.png@01CFFE9C.25904980]
Hi,
How to set the above properties on JavaSQLContext. I am not able to see
setConf method on JavaSQLContext Object.
I have added spark core jar and spark assembly jar to my build path. And I am
using spark 1.1.0 and hadoop 2.4.0
--Naveen
Thanks Akhil.
-Naveen
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, November 12, 2014 6:38 PM
To: Naveen Kumar Pokala
Cc: user@spark.apache.org
Subject: Re: Spark SQL configurations
JavaSQLContext.sqlContext.setConf is available.
Thanks
Best Regards
On Wed, Nov 12, 2014
HI,
I am facing the following problem when I am trying to save my RDD as parquet
File.
14/11/12 07:43:59 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
(TID 48,): org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
Hi,
I am spark 1.1.0. I need a help regarding saving rdd in a JSON file?
How to do that? And how to mentions hdfs path in the program.
-Naveen
Hi,
This is my Instrument java constructor.
public Instrument(Issue issue, Issuer issuer, Issuing issuing) {
super();
this.issue = issue;
this.issuer = issuer;
Hi,
JavaRDDInteger distData = sc.parallelize(data);
On what basis parallelize splits the data into multiple datasets. How to handle
if we want these many datasets to be executed per executor?
For example, my data is of 1000 integers list and I am having 2 node yarn
cluster. It is diving into
.
How to check how many cores are running to complete task of 8 datasets?(Is
there any commands or UI to check that)
Regards,
Naveen.
From: holden.ka...@gmail.com [mailto:holden.ka...@gmail.com] On Behalf Of
Holden Karau
Sent: Friday, November 07, 2014 12:46 PM
To: Naveen Kumar Pokala
Cc: user
Hi,
I have a 2 node yarn cluster and I am using spark 1.1.0 to submit my tasks.
As per the documentation of spark, number of cores are maximum cores available.
So does it mean each node creates no of cores = no of threads to process the
job assigned to that node.
For ex,
ListInteger
Hi,
I have installed 2 node hadoop cluster (For example, on Unix machines A and
B. A master node and data node, B is data node)
I am submitting my driver programs through SPARK 1.1.0 with bin/spark-submit
from Putty Client from my Windows machine.
I want to debug my program from Eclipse
26 matches
Mail list logo