Roy - can you check if you have HADOOP_CONF_DIR and YARN_CONF_DIR set to the
directory containing the HDFS and YARN configuration files?
From: Sandeep Nemuri
Date: Monday, March 27, 2017 at 9:44 AM
To: Saisai Shao
Cc: Yong Zhang
You should try adding your NN host and port in the URL.
On Mon, Mar 27, 2017 at 11:03 AM, Saisai Shao
wrote:
> It's quite obvious your hdfs URL is not complete, please looks at the
> exception, your hdfs URI doesn't have host, port. Normally it should be OK
> if HDFS is
Hi all,
I am trying to trap UI kill event of a spark application from driver.
Some how the exception thrown is not propagated to the driver main
program. See for example using spark-shell below.
Is there a way to get hold of this event and shutdown the driver program?
Regards,
Noorul
Sending plain text mail to test whether my mail appear in the list.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/This-is-a-test-mail-please-ignore-tp28538.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi!
In spark on yarn, when are shuffle files on local disk removed? (Is it when
the app completes or
once all the shuffle files are fetched or end of the stage?)
Thanks,
Ashwin
Hi Hyukjin,
It was a false alarm as I had a local change to `def schema` in
`Dataset` that caused the issue.
I apologize for the noise. Sorry and thanks a lot for the prompt
response. I appreciate.
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2
Sorry, I don't think that I understand the question. Value is just a
binary blob that we get from kafka and pass to you. If its stored in JSON,
I think the code I provided is a good option, but if you are using a
different encoding you may need to write a UDF.
On Fri, Mar 24, 2017 at 4:58 PM,
You need to import col from org.apache.spark.sql.functions.
On Mon, Mar 27, 2017 at 1:20 PM, kaniska Mandal
wrote:
> Hi Michael,
>
> Can you please check if I am using correct version of spark-streaming
> library as specified in my pom (specified in the email) ?
>
>
The timestamp type is only microsecond precision. You would need to store
it on your own (as binary or limited range long or something) if you
require nanosecond precision.
On Mon, Mar 27, 2017 at 5:29 AM, Devender Yadav <
devender.ya...@impetus.co.in> wrote:
> Hi All,
>
> I am using spark
Hi friends,
I have a code which is written in Scala. The scala version 2.10.4 and Spark
version 1.5.2 are used to run the code.
I would like to upgrade the code to the most updated version of spark,
meaning 2.1.0.
Here is the build.sbt:
import AssemblyKeys._
assemblySettings
name :=
Hi Michael,
Can you please check if I am using correct version of spark-streaming
library as specified in my pom (specified in the email) ?
col("value").cast("string") - throwing an error 'cannot find symbol method
col(java.lang.String)'
I tried $"value" which results into similar compilation
Hello!
I am running Spark on Java and bumped into a problem I can't solve or find
anything helpful among answered questions, so I would really appreciate
your help.
I am running some calculations, creating rows for each result:
List results = new LinkedList();
for(something){
It's quite obvious your hdfs URL is not complete, please looks at the
exception, your hdfs URI doesn't have host, port. Normally it should be OK
if HDFS is your default FS.
I think the problem is you're running on HDI, in which default FS is wasb.
So here short name without host:port will lead to
Hi,
While toying with selectExpr I've noticed that the schema changes to
include id column. I can't seem to explain it. Anyone?
scala> spark.range(1).printSchema
root
|-- value: long (nullable = true)
scala> spark.range(1).selectExpr("*").printSchema
root
|-- id: long (nullable = false)
p.s.
Hi All,
I am using spark version - 1.6.1
I have a text table in hive having `timestamp` datatype with nanoseconds
precision.
Hive Table Schema:
c_timestamp timestamp
Hive Table data:
hive> select * from tbl1;
OK
00:00:00.1
12:12:12.123456789
Usually you define the dependencies to the Spark library as provided. You also
seem to mix different Spark versions which should be avoided.
The Hadoop library seems to be outdated and should also only be provided.
The other dependencies you could assemble in a fat jar.
> On 27 Mar 2017, at
yup, that solves the compilation issue :-)
one quick question regarding specifying Decoder in kafka stream:
please note that I am encoding the message as follows while sending data to
kafka -
*String msg = objectMapper.writeValueAsString(tweetEvent);*
*return msg.getBytes();*
I have a
Ah, I understand what you are asking now. There is no API for specifying a
kafka specific "decoder", since Spark SQL already has a rich language for
expressing transformations. The dataframe code I gave will parse the JSON
and materialize in a class, very similar to what
Shuffle files are cleaned when they are no longer referenced. See
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala
On Mon, Mar 27, 2017 at 12:38 PM, Ashwin Sai Shankar <
ashan...@netflix.com.invalid> wrote:
> Hi!
>
> In spark on yarn, when are
check these versions
function create_build_sbt_file {
BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
[ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
cat >> $BUILD_SBT_FILE << !
lazy val root = (project in file(".")).
settings(
name :=
Hi All:
when I use sparking streaming to consume kafka data to stroe in HDFS with orc
files format.
it's fast in the beginning,but it's slow in several hours later.
environment:
spark version: 2.1.0
kafka version 0.10.1.1
spark-streaming-kafka jar's version: spark-streaming-kafka-0-8-2.11
Thanks Mark! follow up question, do you know when shuffle files are usually
un-referenced?
On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra
wrote:
> Shuffle files are cleaned when they are no longer referenced. See
> https://github.com/apache/spark/blob/master/core/src/
>
it yo
On Jörn Franke , Mar 28, 2017 12:11 AM wrote:Usually you define the dependencies to the Spark library as provided. You also seem to mix different Spark versions which should be avoided.The Hadoop library seems to be outdated and should also only be provided.The other
Per SPARK-19630, wondering if there are plans to support "STORED BY" clause
for Spark 2.x?
Thanks!
Hi Soumitra,
We're working on that. The Idea here is to use Kafka to get brokers'
information of the topic and use Kafka client to find coresponding offsets
on new cluster (
https://jeqo.github.io/post/2017-01-31-kafka-rewind-consumers-offset/). You
need kafka >=0.10.1.0 because it supports
Thanks for your confirmation.
On 28 Mar 2017 5:02 a.m., "Jacek Laskowski" wrote:
Hi Hyukjin,
It was a false alarm as I had a local change to `def schema` in
`Dataset` that caused the issue.
I apologize for the noise. Sorry and thanks a lot for the prompt
response. I
When the RDD using them goes out of scope.
On Mon, Mar 27, 2017 at 3:13 PM, Ashwin Sai Shankar
wrote:
> Thanks Mark! follow up question, do you know when shuffle files are
> usually un-referenced?
>
> On Mon, Mar 27, 2017 at 2:35 PM, Mark Hamstra
Hi, did you guys figure it out?
Thanks
Soumitra
On Sun, Mar 5, 2017 at 9:51 PM nguyen duc Tuan wrote:
> Hi everyone,
> We are deploying kafka cluster for ingesting streaming data. But
> sometimes, some of nodes on the cluster have troubles (node dies, kafka
> daemon is
I just tried to build against the current master to help check -
https://github.com/apache/spark/commit/3fbf0a5f9297f438bc92db11f106d4a0ae568613
It seems I can't reproduce this as below:
scala> spark.range(1).printSchema
root
|-- id: long (nullable = false)
scala>
29 matches
Mail list logo