Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-17 Thread Chanh Le
Hi Ayan, I succeed with Tableau but I still can’t import metadata from Hive to Oracle BI. Is that Oracle BI still can’t connect to STS. Regards, Chanh > On Jul 15, 2016, at 11:44 AM, ayan guha wrote: > > Its possible that transfar protocols are not matching, thats

Re: Filtering RDD Using Spark.mllib's ChiSqSelector

2016-07-17 Thread Yanbo Liang
Hi Tobi, Thanks for clarifying the question. It's very straight forward to convert the filtered RDD to DataFrame, you can refer the following code snippets: from pyspark.sql import Row rdd2 = filteredRDD.map(lambda v: Row(features=v)) df = rdd2.toDF() Thanks Yanbo 2016-07-16 14:51 GMT-07:00

Re: scala.MatchError on stand-alone cluster mode

2016-07-17 Thread Mekal Zheng
Hi, Rishabh Bhardwaj, Saisai Shao, Thx for your help. I hava found that the key reason is I forgot to upload the jar package to all of the node in cluster, so after the master distributed the job and selected one node as the driver, the driver can not find the jar package and throw an exception.

Re: Spark streaming takes longer time to read json into dataframes

2016-07-17 Thread Diwakar Dhanuskodi
Hi, Repartition would  create  shuffle  over  network  which  I should  avoid  to   reduce processing time because the size of messages at most in a batch will  be  5G.   Partitioning topic and parallelize receiving in Direct Stream might do  the   trick. Sent from Samsung Mobile.

Re: Dataframe Transformation with Inner fields in Complex Datatypes.

2016-07-17 Thread ayan guha
Hi withColumn adds the column. If you want different name, please use .alias() function. On Mon, Jul 18, 2016 at 2:16 AM, java bigdata wrote: > Hi Team, > > I am facing a major issue while transforming dataframe containing complex > datatype columns. I need to update the

Re: How to recommend most similar users using Spark ML

2016-07-17 Thread Karl Higley
There are also some Spark packages for finding approximate nearest neighbors using locality sensitive hashing: https://spark-packages.org/?q=tags%3Alsh On Fri, Jul 15, 2016 at 7:45 AM nguyen duc Tuan wrote: > Hi jeremycod, > If you want to find top N nearest neighbors for

How to use Spark scala custom UDF in spark sql CLI or beeline client

2016-07-17 Thread pooja mehta
Hi, How to Use Spark scala custom UDF in spark sql CLI or Beeline client. with sqlContext we can register a UDF like this: sqlContext.udf.register("sample_fn", sample_fn _ ) What is the way to use UDF in Spark sql CLI or beeline client. Thanks Pooja

Dataframe Transformation with Inner fields in Complex Datatypes.

2016-07-17 Thread java bigdata
Hi Team, I am facing a major issue while transforming dataframe containing complex datatype columns. I need to update the inner fields of complex datatype, for eg: converting one inner field to UPPERCASE letters, and return the same dataframe with new upper case values in it. Below is my issue

unsubscribe

2016-07-17 Thread Burger, Robert
Robert Burger | Solutions Design IT Specialist | CBAW TS | TD Wealth Technology Solutions 79 Wellington Street West, 17th Floor, TD South Tower, Toronto, ON, M5K 1A2 If you wish to unsubscribe from receiving commercial electronic messages from TD Bank Group, please click here or go to the

Re: Spark (on Windows) not picking up HADOOP_CONF_DIR

2016-07-17 Thread Jacek Laskowski
Hi, How did you set it? How do you run the app? Use sys.env to know whether it was set or not. Jacek On 17 Jul 2016 11:33 a.m., "Daniel Haviv" wrote: > Hi, > I'm running Spark using IntelliJ on Windows and even though I set > HADOOP_CONF_DIR it does not affect

Re: How can we control CPU and Memory per Spark job operation..

2016-07-17 Thread Jacek Laskowski
Hi, How would that help?! Why would you do that? Jacek On 17 Jul 2016 7:19 a.m., "Pedro Rodriguez" wrote: > You could call map on an RDD which has “many” partitions, then call > repartition/coalesce to drastically reduce the number of partitions so that > your second

Spark (on Windows) not picking up HADOOP_CONF_DIR

2016-07-17 Thread Daniel Haviv
Hi, I'm running Spark using IntelliJ on Windows and even though I set HADOOP_CONF_DIR it does not affect the contents of sc.hadoopConfiguration. Anybody encountered it ? Thanks, Daniel