Dataset API inconsistencies

2018-01-09 Thread Alex Nastetsky
I am finding using the Dataset API to be very cumbersome to use, which is unfortunate, as I was looking forward to the type-safety after coming from a Dataframe codebase. This link summarizes my troubles: http://loicdescotte. github.io/posts/spark2-datasets-type-safety/ The problem is having to

Re: Spark structured streaming time series forecasting

2018-01-09 Thread Tathagata Das
Spark-ts has been under development for a while. So I doubt there is any integration with Structured Streaming. That said, Structured Streaming uses DataFrames and Datasets, and a lot of existing libraries build on Datasets/DataFrames should work directly, especially if they are map-like

How to create security filter for Spark UI in Spark on YARN

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment*: AWS EMR, yarn cluster. *Description*: I am trying to use a java filter to protect the access to spark ui, this by using the property spark.ui.filters; the problem is that when spark is running on yarn mode, that property is being allways overriden by hadoop with the filter

Spark UI stdout/stderr links point to executors internal address

2018-01-09 Thread Jhon Anderson Cardenas Diaz
*Environment:* AWS EMR, yarn cluster. *Description:* On Spark ui, in Environment and Executors tabs, the links of stdout and stderr point to the internal address of the executors. This would imply to expose the executors so that links can be accessed. Shouldn't those links be pointed to

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Andrew Ash
That source repo is at https://github.com/palantir/spark/ with artifacts published to Palantir's bintray at https://palantir.bintray.com/releases/org/apache/spark/ If you're seeing any of them in Maven Central please flag, as that's a mistake! Andrew On Tue, Jan 9, 2018 at 10:10 AM, Sean Owen

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Sean Owen
Just to follow up -- those are actually in a Palantir repo, not Central. Deploying to Central would be uncourteous, but this approach is legitimate and how it has to work for vendors to release distros of Spark etc. On Tue, Jan 9, 2018 at 11:43 AM Nan Zhu wrote: > Hi,

Re: Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
nvm On Tue, Jan 9, 2018 at 9:42 AM, Nan Zhu wrote: > Hi, all > > Out of curious, I just found a bunch of Palantir release under > org.apache.spark in maven central (https://mvnrepository.com/ > artifact/org.apache.spark/spark-core_2.11)? > > Is it on purpose? > > Best, >

Palantir replease under org.apache.spark?

2018-01-09 Thread Nan Zhu
Hi, all Out of curious, I just found a bunch of Palantir release under org.apache.spark in maven central ( https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11)? Is it on purpose? Best, Nan

[SPARK-CORE] JVM Properties passed as -D, not being found inside UDAF classes

2018-01-09 Thread Uchoa, Rodrigo
Hi everyone! I'm trying to pass -D properties (JVM properties) to a Spark Application where we have some UDAF (User Defined Aggregate Functions) who will read those properties (using System.getProperty()). The problem is, the properties are never there when we try to read them. According to

Re: PIG to Spark

2018-01-09 Thread Gourav Sengupta
I may be wrong here, but when I see github and apache pig, it says that there are 8 contributors, and when I see github and look at apache spark it says there are more than 1000 contributors. And if the above is true I ask myself, why not shift to SPARK by learning it? I also started with map

Re: Spark Monitoring using Jolokia

2018-01-09 Thread Gourav Sengupta
Hi, I am totally confused here, may be because I do not exactly understand this, but why is this required? I have always used SPARK UI and found that more than sufficient. And if you know a bit about how SPARK session works then your performance does have a certain degree of predictability as