Hi ,
I need to use batch start time in my spark streaming job.
I need the value of batch start time inside one of the functions that is
called within a flatmap function in java.
Please suggest me how this can be done.
I tried to use the StreamingListener class and set the value of a variable
Hi folks,
Does anyone know whether the Grid Search capability is enabled since the issue
spark-9011 of version 1.4.0 ? I'm getting the "rawPredictionCol column doesn't
exist" when trying to perform a grid search with Spark 1.4.0.
Cheers,
Ardo
Het Eyal, I just checked the couchbase spark connector jar. The target
version of some of classes are Java 8 (52.0). You can create a ticket in
https://issues.couchbase.com/projects/SPARKC
Best Regards,
Shixiong Zhu
2015-11-26 9:03 GMT-08:00 Ted Yu :
> StoreMode is from
Could you attach the yarn AM log ?
On Fri, Nov 27, 2015 at 8:10 AM, Jagat Singh wrote:
> Hi,
>
> What is the correct way to stop fully the Spark job which is running as
> yarn-client using spark-submit.
>
> We are using sc.stop in the code and can see the job still running
Hi,
What is the correct way to stop fully the Spark job which is running as
yarn-client using spark-submit.
We are using sc.stop in the code and can see the job still running (in yarn
resource manager) after final hive insert is complete.
The code flow is
start context
do somework
insert to
HDFS has a default replication factor of 3
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-a-3-8-T-dataset-take-up-11-59-Tb-on-HDFS-tp25471p25497.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Spark experts,
First of all, happy Thanksgiving!
The comes to my question, I have implemented custom Hadoop InputFormat to
load millions of entities from my data source to Spark(as JavaRDD and
transform to DataFrame). The approach I took in implementing the custom
Hadoop RDD is loading all
1. GraphX doesn't have a concept of undirected graphs, Edges are always
specified with a srcId and dstId. However there is nothing to stop you
adding in edges that point in the other direction i.e. if you have an edge
with srcId -> dstId you can add an edge dstId -> srcId
2. In general APIs will
Hi,
I am building a spark-sql application in Java. I created a maven project in
Eclipse and added all dependencies including spark-core and spark-sql. I am
creating HiveContext in my spark program and then try to run sql queries
against my Hive Table. When I submit this job in spark, for some
Hi.
I am doing very large collectAsMap() operations, about 10,000,000 records,
and I am getting
"org.apache.spark.SparkException: Error communicating with MapOutputTracker"
errors..
details:
"org.apache.spark.SparkException: Error communicating with MapOutputTracker
at
Is there a way to control how large the part- files are for a parquet
dataset? I'm currently using e.g.
results.toDF.coalesce(60).write.mode("append").parquet(outputdir)
to manually reduce the number of parts, but this doesn't map linearly to
fewer parts: I noticed that coalescing to 30 actually
Hi,
I just built spark without hive jars and trying to run
start-master.sh
I get this error in the log. Sounds like it cannot find
java.lang.ClassNotFoundException: org.slf4j.Logger
Spark Command: /usr/java/latest/bin/java -cp
I don't think it is a deliberate design.
So you may need do action on the RDD before the action of
RDD, if you want to explicitly checkpoint RDD.
2015-11-26 13:23 GMT+08:00 wyphao.2007 :
> Spark 1.5.2.
>
> 在 2015-11-26 13:19:39,"张志强(旺轩)" 写道:
>
Hi all,
I have a uber jar made with maven, the contents are:
my.org.my.classes.Class
...
lib/lib1.jar // 3rd party libs
lib/lib2.jar
I'm using this kind of jar for hadoop applications and all works fine.
I added spark libs, scala and everything needed in spark, but when I submit
this jar to
I'm not %100 sure, but I don't think a jar within a jar will work without a
custom class loader. You can perhaps try to use "maven-assembly-plugin" or
"maven-shade-plugin" to build your uber/fat jar. Both of these will build a
flattened single jar.
--
Ali
On Nov 26, 2015, at 2:49 AM, Marc de
Im using Spark1.4.2 with Hadoop 2.7, I tried increasing
spark.shuffle.io.maxRetries to 10 but didn't help.
Any ideas on what could be causing this??
This is the exception that I am getting:
[MySparkApplication] WARN : Failed to execute SQL statement select *
from TableS s join TableC c on
Hi,
In python how to use inputformat/custom recordreader?
Thanks,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
hi guys, when I am trying to connect hive with spark-sql,I got a problem
like below:
[root@master spark]# bin/spark-shell --master local[4]log4j:WARN No appenders
could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).log4j:WARN Please
initialize the log4j system
Please take a look at:
python//pyspark/tests.py
There're examples using sc.hadoopFile() and sc.newAPIHadoopRDD()
Cheers
On Thu, Nov 26, 2015 at 4:50 AM, Patcharee Thongtra <
patcharee.thong...@uni.no> wrote:
> Hi,
>
> In python how to use inputformat/custom recordreader?
>
> Thanks,
>
bq. (Permission denied)
Have you checked the permission for /mnt/md0/var/lib/spark/... ?
Cheers
On Thu, Nov 26, 2015 at 3:03 AM, Sahil Sareen wrote:
> Im using Spark1.4.2 with Hadoop 2.7, I tried increasing
> spark.shuffle.io.maxRetries to 10 but didn't help.
>
> Any
Hi,
I am trying to set a connection to Couchbase. I am at the very beginning,
and I got stuck on this exception
Exception in thread "main" java.lang.UnsupportedClassVersionError:
com/couchbase/spark/StoreMode : Unsupported major.minor version 52.0
Here is the simple code fragment
val sc =
This implies version mismatch between the JDK used to build your jar and
the one at runtime.
When building, target JDK 1.7
There're plenty of posts on the web for dealing with such error.
Cheers
On Thu, Nov 26, 2015 at 7:31 AM, Eyal Sharon wrote:
> Hi,
>
> I am trying to
Have you seen this thread ?
http://search-hadoop.com/m/q3RTtCoKmv14Hd1H1=Re+Spark+Hive+max+key+length+is+767+bytes
On Thu, Nov 26, 2015 at 5:26 AM, wrote:
> hi guys,
>
> when I am trying to connect hive with spark-sql,I got a problem like
> below:
>
>
> [root@master
Hi ,
Great , that gave some directions. But can you elaborate more? or share
some post
I am currently running JDK 7 , and my Couchbase too
Thanks !
On Thu, Nov 26, 2015 at 6:02 PM, Ted Yu wrote:
> This implies version mismatch between the JDK used to build your jar and
>
StoreMode is from Couchbase connector.
Where did you obtain the connector ?
See also
http://stackoverflow.com/questions/1096148/how-to-check-the-jdk-version-used-to-compile-a-class-file
On Thu, Nov 26, 2015 at 8:55 AM, Eyal Sharon wrote:
> Hi ,
> Great , that gave some
Hi Spark people,
I have a Hive table that has a lot of small parquet files and I am
creating a data frame out of it to do some processing, but since I have a
large number of splits/files my job creates a lot of tasks, which I don't
want. Basically what I want is the same functionality that Hive
An interesting compaction approach of small files is discussed recently
http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/
AFAIK Spark supports views too.
--
Ruslan Dautkhanov
On Thu, Nov 26, 2015 at 10:43 AM, Nezih Yigitbasi <
I am using spark-1.5.1-bin-hadoop2.6. I used
spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 to create a cluster and configured
spark-env to use python3. I get and exception 'Randomness of hash of string
should be disabled via PYTHONHASHSEED¹. Is there any reason rdd.py should
not just set PYTHONHASHSEED
Thanks Davies and Nathan,
I found my error.
I was using *ArrayType()* and I need to pass de kind of type has in this
array and I has not passing *ArrayType(IntegerType())*.
Thanks :)
On Wed, Nov 25, 2015 at 7:46 PM, Davies Liu wrote:
> It works in master (1.6), what's
Hi,
I think you just want to put the hive-site.xml in the spark/conf directory and
it would load
it into spark classpath.
Best,
Sun.
fightf...@163.com
From: Chandra Mohan, Ananda Vel Murugan
Date: 2015-11-27 15:04
To: user
Subject: error while creating HiveContext
Hi,
I am building a
Hi All,
Apologies if this question has been asked before. I'd like to know if there
are any downsides to running spark over yarn with the --master yarn-cluster
option vs having a separate spark standalone cluster to execute jobs?
We're looking at installing a hdfs/hadoop cluster with Ambari and
If your cluster is a dedicated spark cluster (only running spark job, no
other jobs like hive/pig/mr), then spark standalone would be fine.
Otherwise I think yarn would be a better option.
On Fri, Nov 27, 2015 at 3:36 PM, cs user wrote:
> Hi All,
>
> Apologies if this
For such large output, I would suggest you to do the following processing
in cluster rather than in driver (use RDD api to do that).
If you really want to pull it to driver, then you can first save it in hdfs
and then read it using hdfs api to avoid the akka issue
On Fri, Nov 27, 2015 at 2:41 PM,
34 matches
Mail list logo