Hi All
I am running a spark streaming job with below configuration :
--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"
But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN
logs are getting written but then it
It seems that we are using the function incorrectly.
val a = Seq((1,10),(2,20)).toDF("foo","bar")
val b = a.select($"foo")
val c = b.where(b("bar") === 20)
c.show
Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot
resolve column name "bar" among
Elementwise product is described here :
https://spark.apache.org/docs/latest/mllib-feature-extraction.html#elementwiseproduct
I don't know if it will work with your input thought.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Please share your code
From: Thijs Haarhuis
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value
Hi all,
Does anybody have any experience in accessing the data from a column which has
a binary
It seems that fabric8 kubernetes client can't parse the caCertFile in the
default location /var/run/secrets/kubernetes.io/serviceaccount/ca.crt, and
anybody give me some advices?
On Wed, 13 Feb 2019 at 16:21, dawn breaks <2005dawnbre...@gmail.com> wrote:
> we submit spark job to k8s by the
Yeah, the filter gets infront of the select after analyzing
scala> b.where($"bar" === 20).explain(true)
== Parsed Logical Plan ==
'Filter ('bar = 20)
+- AnalysisBarrier
+- Project [foo#6]
+- Project [_1#3 AS foo#6, _2#4 AS bar#7]
+- SerializeFromObject
Hmm, I’m not asking about using k8s to control Spark as a Job manager or
scheduler like Yarn. We use the built-in standalone Spark Job Manager and
sparl://spark-api:7077 as the master not k8s.
The problem is using k8s to manage a cluster consisting of our app, some
databases, and Spark (one
This is indeed strange. To add to the question , I can see that if I use a
filter I get an exception (as expected) , so I am not sure what's the
difference between the where clause and filter :
b.filter(s=> {
val bar : String = s.getAs("bar")
bar.equals("20")
}).show
*
I don't know if this is a bug or a feature, but it's a bit counter-intuitive
when reading code.
The "b" dataframe does not have field "bar" in its schema, but is still able to
filter on that field.
scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar")
a:
Thanks Peter.
I'm not sure if that is possible yet. The closest I can think of to
achieving what you want is to try something like:-
df.registerTempTable("mytable")
sql("create table mymanagedtable as select * from mytable")
I haven't used CTAS in Spark SQL before but have heard it works. This
Currently there seems to be 3 places to check task level logs:
1) Using spark UI
2) `yarn application log`
3) log aggregation on hdfs (if enabled)
All above only give you log at executor(container) level. However one
executor can have multiple threads and each might be running part of
different
Hi Chris,
Thank you for the input, I know I can always write the table DDL manually.
But here I would like to rely on Spark generating the schema. What I don't
understand is the change in the behaviour of Spark: having the storage path
specified does not necessarily mean it should be an external
Hello I need a design recommendation.
I need to calcualte a couple of calculations with min shuffling and better
perf. I have an nested structure with say a class have n number of students
and structure will be similiar to this
{ classId: String,
StudendId:String,
Score:Int,
AreaCode:String}
Hi all,
Does anybody have any experience in accessing the data from a column which has
a binary type in a Spark Data Frame in R?
I have a Spark Data Frame which has a column which is of a binary type. I want
to access this data and process it.
In my case I collect the spark data frame to a R
Hi Dawn,
Probably, you are providing the incorrect image(must be a java image) or the
incorrect master ip or the service account. Please verify the pod’s permissions
for the service account(‘spark’ in your case).
I have tried executing the same program as below:
./spark-submit --master
Hey there,
Could you not just create a managed table using the DDL in Spark SQL and
then written the data frame to the underlying folder or use Spark SQL to do
an insert?
Alternatively try create table as select. Iirc hive creates managed tables
this way.
I've not confirmed this works but I
Dear All,
I am facing a strange issue with Spark 2.3, where I would like to create a
MANAGED table out of the content of a DataFrame with the storage path
overridden.
Apparently, when one tries to create a Hive table via
DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to
Adding to Gabor's answer, in Spark 3.0 end users can even provide full of
group id (Please refer SPARK-26350 [1]), but you may feel more convenient
to use prefix of group id Gabor guided (Please refer SPARK-26121 [2]) to
provide permission to broader ranges of groups.
1.
Hi Thomas,
The issue occurs when the user does not have the READ permission on the
consumer groups.
In DStreams group ID is configured in application, for example:
we submit spark job to k8s by the following command, and the driver pod got
an error and exit. Anybody can help us to solve it?
./bin/spark-submit \
--master k8s://https://172.21.91.48:6443 \
--deploy-mode cluster \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
22 matches
Mail list logo