Re: PIG to Spark

2018-01-08 Thread Jeff Zhang
Pig support spark engine now, so you can leverage spark execution with pig script. I am afraid there's no solution to convert pig script to spark api code Pralabh Kumar 于2018年1月8日周一 下午11:25写道: > Hi > > Is there a convenient way /open source project to convert PIG

Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread Felix Cheung
And Hadoop-3.x is not part of the release and sign off for 2.2.1. Maybe we could update the website to avoid any confusion with "later". From: Josh Rosen Sent: Monday, January 8, 2018 10:17:14 AM To: akshay naidu Cc: Saisai Shao; Raj

select with more than 5 typed columns

2018-01-08 Thread Nathan Kronenfeld
Looking in Dataset, there are select functions taking from 1 to 5 TypedColumn arguments. Is there a built-in way to pull out more than 5 typed columns into a Dataset (without having to resort to using a DataFrame, or manual processing of the RDD)? Thanks, - Nathan Kronenfeld

Re: Spark Monitoring using Jolokia

2018-01-08 Thread Thakrar, Jayesh
And here's some more info on Spark Metrics https://www.slideshare.net/JayeshThakrar/apache-bigdata2017sparkprofiling From: Maximiliano Felice Date: Monday, January 8, 2018 at 8:14 AM To: Irtiza Ali Cc: Subject: Re: Spark

Spark MakeRDD preferred workers

2018-01-08 Thread Christopher Piggott
Hi, def makeRDD[T](seq: Seq[(T, Seq[String])])(implicit arg0: ClassTag[T]): RDD[T] list of tuples of data and location preferences (hostnames of Spark nodes) Is that list a list of acceptable choices, and it will choose one of them? Or is it an ordered list? I'm trying to ascertain how

Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread Josh Rosen
My current best guess is that Spark does *not* fully support Hadoop 3.x because https://issues.apache.org/jira/browse/SPARK-18673 (updates to Hive shims for Hadoop 3.x) has not been resolved. There are also likely to be transitive dependency conflicts which will need to be resolved. On Mon, Jan

Re: Is Apache Spark-2.2.1 compatible with Hadoop-3.0.0

2018-01-08 Thread akshay naidu
yes , spark download page does mention that 2.2.1 is for 'hadoop-2.7 and later', but my confusion is because spark was released on 1st dec and hadoop-3 stable version released on 13th Dec. And to my similar question on stackoverflow.com

PIG to Spark

2018-01-08 Thread Pralabh Kumar
Hi Is there a convenient way /open source project to convert PIG scripts to Spark. Regards Pralabh Kumar

Spark structured streaming time series forecasting

2018-01-08 Thread Bogdan Cojocar
Hello, Is there a method to do time series forecasting in spark structured streaming? Is there any integration going on with spark-ts or a similar library? Many thanks, Bogdan Cojocar

binaryFiles() on directory full of directories

2018-01-08 Thread Christopher Piggott
I have a top level directory in HDFS that contains nothing but subdirectories (no actual files). In each one of those subdirs are a combination of files and other subdirs /topdir/dir1/(lots of files) /topdir/dir2/(lots of files) /topdir/dir2//subdir/(lots of files) I

Re: Spark Monitoring using Jolokia

2018-01-08 Thread Maximiliano Felice
Hi! I don't know very much about them, but I'm currently working in posting custom metrics into Graphite. I found useful the internals described in this library: https://github.com/groupon/spark-metrics Hope this at least can give you a hint. Best of

Spark Monitoring using Jolokia

2018-01-08 Thread Irtiza Ali
Hello everyone, I am building a monitoring tool for the spark, for that I needs sparks metrics. I am using jolokia to get the metrics. I have a question that: Can I get all the metrics provided by the spark rest api using the Jolokia? How the spark rest api get the metrics internally? Thanks

Reverse MinMaxScaler in SparkML

2018-01-08 Thread Tomasz Dudek
Hello, since the similar question on StackOverflow remains unanswered ( https://stackoverflow.com/questions/46092114/is-there-no-inverse-transform-method-for-a-scaler-like-minmaxscaler-in-spark ) and perhaps there is a solution that I am not aware of, I'll ask: After traning MinMaxScaler(or