Re: Apache Spark - Custom structured streaming data source

2018-01-25 Thread Tathagata Das
Hello Mans, The streaming DataSource APIs are still evolving and are not public yet. Hence there is no official documentation. In fact, there is a new DataSourceV2 API (in Spark 2.3) that we are migrating towards. So at this point of time, it's hard to make any concrete suggestion. You can take a

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread Kurt Fehlhauer
Can you share your code and a sample of your data? WIthout seeing it, I can't give a definitive answer. I can offer some hints. If you have a column of strings you should either be able to create a new column casted to Integer. This can be accomplished two ways: df.withColumn("newColumn",

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread kant kodali
It seems like its hard to construct a DataType given its String literal representation. dataframe.types() return column names and its corresponding Types. for example say I have an integer column named "sum" doing dataframe.dtypes() would return "sum" and "IntegerType" but this string

Apache Spark - Custom structured streaming data source

2018-01-25 Thread M Singh
Hi: I am trying to create a custom structured streaming source and would like to know if there is any example or documentation on the steps involved. I've looked at the some methods available in the SparkSession but these are internal to the sql package:   private[sql] def

Spark Standalone Mode, application runs, but executor is killed

2018-01-25 Thread Chandu
Hi, I tried my question @ stackoverlfow.com ( https://stackoverflow.com/questions/48445145/spark-standalone-mode-application-runs-but-executor-is-killed-with-exitstatus), yet to be answere, so thought I will tru the user group. I am new to Apache Spark and was trying to run the example Pi

how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread kant kodali
Hi All, I have a datatype "IntegerType" represented as a String and now I want to create DataType object out of that. I couldn't find in the DataType or DataTypes api on how to do that? Thanks!

Re: Get broadcast (set in one method) in another method

2018-01-25 Thread Gourav Sengupta
Hi, Just out of curiosity, in what sort of programming or designing paradigm does this way of solving things fit in? In case you are trying functional programming do you think that currying will help? Regards, Gourav Sengupta On Thu, Jan 25, 2018 at 8:04 PM, Margusja wrote: >

Get broadcast (set in one method) in another method

2018-01-25 Thread Margusja
Hi Maybe I am overthinking. I’d like to set broadcast in object A method y and get it in object A method x. In example: object A { def main (args: Array[String]) { y() x() } def x() : Unit = { val a = bcA.value ... } def y(): String = { val bcA =

Custom build - missing images on MasterWebUI

2018-01-25 Thread Conconscious
Hi list, I'm trying to make a custom build of Spark, but in the end on Web UI there's no images. Some help please. Build from: git checkout v2.2.1 ./dev/make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phive -Phive-thriftserver -Pmesos -Pyarn

Re: Apache Hadoop and Spark

2018-01-25 Thread jamison.bennett
Hi Mutahir, I will try to answer some of your questions. Q1) Can we use Mapreduce and apache spark in the same cluster Yes. I run a cluster with both MapReduce2 and Spark and I use Yarn as the resource manager. Q2) is it mandatory to use GPUs for apache spark? No. My cluster has Spark and does

Re: S3 token times out during data frame "write.csv"

2018-01-25 Thread Jean Georges Perrin
Are you writing from an Amazon instance or from a on premise install to S3? How many partitions are you writing from? Maybe you can try to “play” with repartitioning to see how it behaves? > On Jan 23, 2018, at 17:09, Vasyl Harasymiv wrote: > > It is about 400

Kafka deserialization to Structured Streaming SQL - Encoders.bean result doesn't match itself?

2018-01-25 Thread Iain Cundy
Hi All I'm trying to move from MapWithState to Structured Streaming v2.2.1, but I've run into a problem. To convert from Kafka data with a binary (protobuf) value to SQL I'm taking the dataset from readStream and doing Dataset s = dataset.selectExpr("timestamp", "CAST(key as string)",