UnresolvedAddressException in Kubernetes Cluster

2017-10-09 Thread Suman Somasundar
Hi, I am trying to deploy a Spark app in a Kubernetes Cluster. The cluster consists of 2 machines - 1 master and 1 slave, each of them with the following config: RHEL 7.2 Docker 17.03.1 K8S 1.7. I am following the steps provided in

Re: Cases when to clear the checkpoint directories.

2017-10-09 Thread Tathagata Das
Any changes in the Java code (to be specific, the generated bytecode) in the functions you pass to Spark (i.e., map function, reduce function, as well as it closure dependencies) counts as "application code change", and will break the recovery from checkpoints. On Sat, Oct 7, 2017 at 11:53 AM,

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

2017-10-09 Thread kant kodali
https://issues.apache.org/jira/browse/SPARK-8 On Sun, Oct 8, 2017 at 11:58 AM, kant kodali wrote: > I have the following so far > > private StructType getSchema() { > return new StructType() > .add("name", StringType) > .add("address",

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread kant kodali
Tried the following. dataset.map(new MapFunction>>() { @Override public List> call(String input) throws Exception { List> temp = new ArrayList<>(); temp.add(new HashMap());

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread Koert Kuipers
if you are willing to use kryo encoder you can do your original Dataset< List Seq(1,2,3).toDS.map(x => if (x % 2 == 0) x else x.toString)(org.apache.spark.sql.Encoders.kryo[Any]).map{ (x: Any)

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread kant kodali
Hi Koert, Thanks! If I have this Dataset>> what would be the Enconding?is it Encoding.kryo(Seq.class) ? Also shouldn't List be supported? Should I create a ticket for this? On Mon, Oct 9, 2017 at 6:10 AM, Koert Kuipers wrote: > it supports

Why does Spark need to set log levels

2017-10-09 Thread Daan Debie
Hi all! I would love to use Spark with a somewhat more modern logging framework than Log4j 1.2. I have Logback in mind, mostly because it integrates well with central logging solutions such as the ELK stack. I've read up a bit on getting Spark 2.0 (that's what I'm using currently) to work with

Re: [Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

2017-10-09 Thread ayan guha
Have you raised it in ES connector github as issues? In my past experience (with hadoop connector with Pig), they respond pretty quickly. On Tue, Oct 10, 2017 at 12:36 AM, sixers wrote: > ### Issue description > > We have an issue with data consistency when storing data in

[Spark SQL] Missing data in Elastisearch when writing data with elasticsearch-spark connector

2017-10-09 Thread sixers
### Issue description We have an issue with data consistency when storing data in Elasticsearch using Spark and elasticsearch-spark connector. Job finishes successfully, but when we compare the original data (stored in S3), with the data stored in ES, some documents are not present in

Re: Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread Koert Kuipers
it supports Dataset>> where X must be a supported type also. Object is not a supported type. On Mon, Oct 9, 2017 at 7:36 AM, kant kodali wrote: > Hi All, > > I am wondering if spark supports Dataset>> ? > > when I do the following

Does Spark 2.2.0 support Dataset<List<Map<String,Object>>> ?

2017-10-09 Thread kant kodali
Hi All, I am wondering if spark supports Dataset>> ? when I do the following it says no map function available? Dataset>> resultDs = ds.map(lambda, Encoders.bean(List.class)); Thanks!

Re: [MLlib] RowMatrix computeSVD Native ARPACK support not detecting.

2017-10-09 Thread Weichen Xu
Does you get the warning info such as: `Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS` `Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS` ? These two errors are thrown in `com.github.fommil.netlib.BLAS`, but it catch the original exception

How to avoid creating meta files (.crc files)

2017-10-09 Thread Vikash Pareek
Hi Users, Is there any way to avoid creation of .crc files when writing an RDD with saveAsTextFile method? My use case is, I have mounted S3 on the local file system using S3FS and saving an RDD to mounting point. by looking at S3, I found one .crc file for each part file and even _SUCCESS file.

Fwd: [MLlib] RowMatrix computeSVD Native ARPACK support not detecting.

2017-10-09 Thread Abdullah Bashir
Hi, I am getting the following Warning when i run the pyspark job: My Code is mat = RowMatrix(tf_rdd_vec.cache())  # RDD is cached svd = mat.computeSVD(num_topics, computeU=False) I am using Ubuntu 16.04 EC2 instance. And I have installed all following libraries into my system. sudo apt

RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)

2017-10-09 Thread Mahesh Sawaiker
After doing group, you can use mkstring on the data frame. Following is an example where are columns are concatenated with space as a separator. scala> call_cdf.map(row => row.mkString(" ")).show(false)