Hi,
Is there a way to read a text file from inside a spark executor? I need to
do this for an streaming application where we need to read a file(whose
contents would change) from a closure.
I cannot use the "sc.textFile" method since spark context is not
serializable. I also cannot read a file
Hi,
I am using spark direct stream to consume from multiple topics in Kafka. I
am able to consume fine but I am stuck at how to separate the data for each
topic since I need to process data differently depending on the topic.
I basically want to split the RDD consisting on N topics into N RDD's
Hi,
In earlier versions of spark(< 1.4.0), we were able to specify the sampling
ratio while using *sqlContext.JsonFile* or *sqlContext.JsonRDD* so that we
dont inspect each and every element while inferring the schema.
I see that the use of these methods is deprecated in the newer spark
version
Hi,
I am trying to start a spark thrift server using the following command on
Spark 1.3.1 running on yarn:
* ./sbin/start-thriftserver.sh --master yarn://resourcemanager.snc1:8032
--executor-memory 512m --hiveconf
hive.server2.thrift.bind.host=test-host.sn1 --hiveconf
, 2015 at 5:32 PM, Cheng, Hao hao.ch...@intel.com wrote:
Did you register temp table via the beeline or in a new Spark SQL CLI?
As I know, the temp table cannot cross the HiveContext.
Hao
*From:* Udit Mehta [mailto:ume...@groupon.com]
*Sent:* Wednesday, August 26, 2015 8:19 AM
*To:* user
Hi,
I was wondering what json serde does spark sql use. I created a JsonRDD out
of a json file and then registered it as a temp table to query. I can then
query the table using dot notation for nested structs/arrays. I was
wondering how does spark sql deserialize the json data based on the query.
Hi,
I am running spark 1.3 on yarn and am trying to publish some metrics from
my app. I see that we need to use the codahale library to create a source
and then specify the source in the metrics.properties.
Does somebody have a sample metrics source which I can use in my app to
forward the
Hi,
I am unable to access the metrics servlet on spark 1.2. I tried to access
it from the app master UI on port 4040 but i dont see any metrics there. Is
it a known issue with spark 1.2 or am I doing something wrong?
Also how do I publish my own metrics and view them on this servlet?
Thanks,
didn’t find anything wrong in your mvn command. Can you
check whether the ExecutorLauncher class is in your jar file or not?
BTW: For spark-1.3, you can use the binary distribution from apache.
Thanks.
Zhan Zhang
On Apr 17, 2015, at 2:01 PM, Udit Mehta ume...@groupon.com wrote:
I
-Dhdp.version=2.2.0.0-2041
Is there anything wrong in what I am trying to do?
thanks again!
On Fri, Apr 17, 2015 at 2:56 PM, Zhan Zhang zzh...@hortonworks.com wrote:
Hi Udit,
By the way, do you mind to share the whole log trace?
Thanks.
Zhan Zhang
On Apr 17, 2015, at 2:26 PM, Udit Mehta
I followed the steps described above and I still get this error:
Error: Could not find or load main class
org.apache.spark.deploy.yarn.ExecutorLauncher
I am trying to build spark 1.3 on hdp 2.2.
I built spark from source using:
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive
log4j.properties metrics.properties
slaves.template spark-defaults.conf.template
spark-env.sh.template
*[root@c6402 conf]# more java-opts*
* -Dhdp.version=2.2.0.0-2041*
[root@c6402 conf]#
Thanks.
Zhan Zhang
On Apr 17, 2015, at 3:09 PM, Udit Mehta ume
Hi,
Suppose I have a command and I pass the --files arg as below:
bin/spark-submit --class com.test.HelloWorld --master yarn-cluster
--num-executors 8 --driver-memory 512m --executor-memory 2048m
--executor-cores 4 --queue public * --files $HOME/myfile.txt* --name
test_1
I have noticed a similar issue when using spark streaming. The spark
shuffle write size increases to a large size(in GB) and then the app
crashes saying:
java.io.FileNotFoundException:
spark.shuffle.spill to false if you don't want to spill to
the disk and assuming you have enough heap memory.
On Tue, Mar 31, 2015 at 12:35 PM, Udit Mehta ume...@groupon.com wrote:
I have noticed a similar issue when using spark streaming. The spark
shuffle
write size increases to a large
Hi,
Is it possible to put the log4j.properties in the application jar such that
the driver and the executors use this log4j file. Do I need to specify
anything while submitting my app so that this file is used?
Thanks,
Udit
has this issue been fixed in spark 1.2:
https://issues.apache.org/jira/browse/SPARK-2624
On Mon, Mar 23, 2015 at 9:19 PM, Udit Mehta ume...@groupon.com wrote:
I am trying to run a simple query to view tables in my hive metastore
using hive context.
I am getting this error:
spark Persistence
Another question related to this, how can we propagate the hive-site.xml to
all workers when running in the yarn cluster mode?
On Tue, Mar 24, 2015 at 10:09 AM, Marcelo Vanzin van...@cloudera.com
wrote:
It does neither. If you provide a Hive configuration to Spark,
HiveContext will connect to
?
Cheers
On Fri, Mar 20, 2015 at 1:43 PM, Udit Mehta ume...@groupon.com wrote:
Hi,
We have spark setup such that there are various users running multiple
jobs at the same time. Currently all the logs go to 1 file specified in the
log4j.properties.
Is it possible to configure log4j in spark
I am trying to run a simple query to view tables in my hive metastore using
hive context.
I am getting this error:
spark Persistence process has been specified to use a *ClassLoader Resolve* of
name datanucleus yet this has not been found by the DataNucleus plugin
mechanism. Please check your
Hi,
We have spark setup such that there are various users running multiple jobs
at the same time. Currently all the logs go to 1 file specified in the
log4j.properties.
Is it possible to configure log4j in spark for per app/user logging instead
of sending all logs to 1 file mentioned in the
21 matches
Mail list logo