Re: Spark DataFrameNaFunctions unrecognized

2016-02-15 Thread satish chandra j
HI Ted, Please find the error below: [ERROR] C:\workspace\etl\src\main\scala\stg_mds_pmds_rec_df.scala:116: error: value na is not a member of org.apache.spark.sql.DataFrame [ERROR] var nw_cmpr_df=cmpr_df.na.fill("column1",) [ERROR] Please let me know if any further details

Re: Spark DataFrameNaFunctions unrecognized

2016-02-15 Thread Ted Yu
bq. I am getting compile time error Do you mind pastebin'ning the error you got ? Cheers On Mon, Feb 15, 2016 at 11:08 PM, satish chandra j wrote: > HI Ted, > I understand it works fine if executed in Spark Shell > Sorry, I missed to mention that I am getting

Saving Kafka Offsets to Cassandra at begining of each batch in Spark Streaming

2016-02-15 Thread Abhishek Anand
I have a kafka rdd and I need to save the offsets to cassandra table at the begining of each batch. Basically I need to write the offsets of the type Offsets below that I am getting inside foreachRD, to cassandra. The javafunctions api to write to cassandra needs a rdd. How can I create a rdd

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Gavin Yue
This sqlContext is one instance of hive context, do not be confused by the name. > On Feb 16, 2016, at 12:51, Prabhu Joseph wrote: > > Hi All, > > On creating HiveContext in spark-shell, fails with > > Caused by: ERROR XSDB6: Another instance of Derby may

Re: Spark DataFrameNaFunctions unrecognized

2016-02-15 Thread satish chandra j
HI Ted, I understand it works fine if executed in Spark Shell Sorry, I missed to mention that I am getting compile time error( using Maven for build) I am executing my Spark Job in remote client by submitting the exe jar file Now do I need to import any specific packages make DataFrameNaFucntions

Error when doing a SaveAstable on a Spark dataframe

2016-02-15 Thread SRK
Hi, I get an error when I do a SaveAsTable as shown below. I do have write access to the hive volume. Any idea as to why this is happening? val df = testDF.toDF("id", "rec") df.printSchema() val options = Map("path" -> "/hive/test.db/")

Re: Spark on Windows

2016-02-15 Thread UMESH CHAUDHARY
You can check "spark.master" property in conf/spark-defaults.conf and try to give IP of the VM in place of "localhost". On Tue, Feb 16, 2016 at 7:48 AM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am new to spark and starting working on it by writing small programs. I > am

Re: Side effects of using var inside a class object in a Rdd

2016-02-15 Thread Ted Yu
Age can be computed from the birthdate. Looks like it doesn't need to be a member of Animal class. If age is just for illustration, can you give an example which better mimics the scenario you work on ? Cheers On Mon, Feb 15, 2016 at 8:53 PM, Hemalatha A < hemalatha.amru...@googlemail.com>

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
Thanks Mark, that answers my question. On Tue, Feb 16, 2016 at 10:55 AM, Mark Hamstra wrote: > Welcome to > > __ > > / __/__ ___ _/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > >/___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT > >

Re: Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Mark Hamstra
Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_72) Type in expressions to have them evaluated. Type

Side effects of using var inside a class object in a Rdd

2016-02-15 Thread Hemalatha A
Hello, I want to know what are the cons and performance impacts of using a var inside class object in a Rdd. Here is a example: Animal is a huge class with n number of val type variables (approx >600 variables), but frequently, we will have to update Age(just 1 variable) after some

Creating HiveContext in Spark-Shell fails

2016-02-15 Thread Prabhu Joseph
Hi All, On creating HiveContext in spark-shell, fails with Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /SPARK/metastore_db. Spark-Shell already has created metastore_db for SqlContext. Spark context available as sc. SQL context available as

Re: recommendations with duplicate ratings

2016-02-15 Thread Nick Pentreath
Yes, for implicit data you need to sum up the "ratings" (actually view them as "weights") for each user-item pair. I do this is my ALS application. For ecommerce, say a "view" event has a weight of 1.0 and a "purchase" a weight of 3.0. Then adding multiple events together for a given user and

Re: SparkSQL/DataFrame - Is `JOIN USING` syntax null-safe?

2016-02-15 Thread Zhong Wang
Just checked the code and wrote some tests. Seems it is not null-safe... Shall we consider providing a null-safe option for `JOIN USING` syntax? Zhong On Mon, Feb 15, 2016 at 7:25 PM, Zhong Wang wrote: > Is it null-safe when we use this interface? > -- > > def

Re: New line lost in streaming output file

2016-02-15 Thread Ashutosh Kumar
Request to provide some pointer on this. Thanks On Mon, Feb 15, 2016 at 3:39 PM, Ashutosh Kumar wrote: > I am getting multiple empty files for streaming output for each interval. > To Avoid this I tried > > kStream.foreachRDD(new VoidFunction2(){ >

Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application

2016-02-15 Thread Yogesh Vyas
Hi, I am trying to run a KMeansStreaming from the Java application, but it gives the following error: "Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application" Below is my code: JavaDStream v = trainingData.map(new

Re: IllegalArgumentException UnsatisfiedLinkError snappy-1.1.2 spark-shell error

2016-02-15 Thread Paolo Villaflores
Yes, I have sen that. But java.io.tmpdir has a default definition in linux--it is /tmp. On Tue, Feb 16, 2016 at 2:17 PM, Ted Yu wrote: > Have you seen this thread ? > > >

SparkSQL/DataFrame - Is `JOIN USING` syntax null-safe?

2016-02-15 Thread Zhong Wang
Is it null-safe when we use this interface? -- def join(right: DataFrame, usingColumns: Seq[String], joinType: String): DataFrame Thanks, Zhong

Re: IllegalArgumentException UnsatisfiedLinkError snappy-1.1.2 spark-shell error

2016-02-15 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTtW43zT1e2nfb=Re+ibsnappyjava+so+failed+to+map+segment+from+shared+object On Mon, Feb 15, 2016 at 7:09 PM, Paolo Villaflores wrote: > > Hi, > > > > I am trying to run spark 1.6.0. > > I have previously just

Re: which is better RDD or Dataframe?

2016-02-15 Thread Ted Yu
Can you describe the types of query you want to perform ? If you don't already have a data flow which is optimized for RDD, I would suggest using Dataframe API (or event DataSet API) which gives optimizer more room. Cheers On Mon, Feb 15, 2016 at 6:43 PM, Divya Gehlot

IllegalArgumentException UnsatisfiedLinkError snappy-1.1.2 spark-shell error

2016-02-15 Thread Paolo Villaflores
Hi, I am trying to run spark 1.6.0. I have previously just installed a fresh instance of hadoop 2.6.0 and hive 0.14. Hadoop, mapreduce, hive and beeline are working. However, as soon as I run `sc.textfile()` within spark-shell, it returns an error: $ spark-shell Welcome to

which is better RDD or Dataframe?

2016-02-15 Thread Divya Gehlot
Hi, I would like to know which gives better performance RDDs or dataframes ? Like for one scenario : 1.Read the file as RDD and register as temp table and fire SQL query 2.Read the file through Dataframe API or convert the RDD to dataframe and use dataframe APIs to process the data. For the

Spark on Windows

2016-02-15 Thread KhajaAsmath Mohammed
Hi, I am new to spark and starting working on it by writing small programs. I am able to run those in cloudera quickstart VM but not able to run in the eclipse when giving master URL *Steps I perfromed:* Started Master and can access it through http://localhost:8080 Started worker and access

Re: How to run Scala file examples in spark 1.5.2

2016-02-15 Thread Ted Yu
bq. 150.142.11 The address above seem to be missing one octet. bq. org.apache.spark.examples/HdfsTest The slash should be a dot. Cheers On Mon, Feb 15, 2016 at 5:53 PM, Ashok Kumar wrote: > Thank you sir it is spark-examples-1.5.2-hadoop2.6.0.jar in mine > > Can you

Re: How to run Scala file examples in spark 1.5.2

2016-02-15 Thread Ted Yu
If you don't modify HdfsTest.scala, there is no need to rebuild it - it is contained in the examples jar coming with Spark release. You can use spark-submit to run the example. Cheers On Mon, Feb 15, 2016 at 5:24 PM, Ashok Kumar wrote: > Gurus, > > I am trying to

Re: Text search in Spark on compressed bz2 files

2016-02-15 Thread Mich Talebzadeh
On 16/02/2016 00:02, Mich Talebzadeh wrote: > Hi > > It does not seem that sc.textFile supports search on log files compressed > with bzip2 > > val logfile2 = sc.textFile("hdfs://rhes564:9000/test/REP_*.log.bz2") > > val df2 = logfile2.toDF("line") > val errors2 =

How to run Scala file examples in spark 1.5.2

2016-02-15 Thread Ashok Kumar
Gurus, I am trying to run some examples given under directory examples spark/examples/src/main/scala/org/apache/spark/examples/ I am trying to run HdfsTest.scala However, when I run HdfsTest.scala  against spark shell it comes back with error Spark context available as sc. SQL context available

Re: Migrating Transformers from Spark 1.3.1 to 1.5.0

2016-02-15 Thread Cesar Flores
I found my problem. I was calling setParameterValue(defaultValue) more than one time in the hierarchy of my classes. Thanks! On Mon, Feb 15, 2016 at 6:34 PM, Cesar Flores wrote: > > I have a set of transformers (each with specific parameters) in spark > 1.3.1. I have two

Migrating Transformers from Spark 1.3.1 to 1.5.0

2016-02-15 Thread Cesar Flores
I have a set of transformers (each with specific parameters) in spark 1.3.1. I have two versions, one that works and one that does not: 1.- working version //featureprovidertransformer contains already a set of ml params class DemographicTransformer(override val uid: String) extends

Re: Dataset takes more memory compared to RDD

2016-02-15 Thread Michael Armbrust
What algorithm? Can you provide code? On Fri, Feb 12, 2016 at 3:22 PM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > Hello All, > > I implemented an algorithm using both the RDDs and the Dataset API (in > Spark 1.6). Dataset version takes lot more memory than the RDDs. Is this >

RE: Check if column exists in Schema

2016-02-15 Thread Mohammed Guller
The DataFrame class has a method named columns, which returns all column names as an array. You can then use the contains method in the Scala Array class to check whether a column exists. Mohammed Author: Big Data Analytics with

Out of Memory error caused by output object in mapPartitions

2016-02-15 Thread nitinkak001
My mapPartition code as given below outputs one record for each input record. So, the output object has equal number of records as input. I am loading the output data into a listbuffer object. This object is turning out to be too huge for memory leading to Out Of Memory exception. To be more

Working out the optimizer matrix in Spark

2016-02-15 Thread Mich Talebzadeh
Hi, I would like to know if there are commands available with spark to allow to see all active processes plus the details of each process. FYI I am aware of cluster information in Spark GUI on port 4040. What I am specifically looking is details from the optimiser itself, the physical and

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Eduardo Costa Alfaia
Hi Gourav, I did a prove as you said, for me it’s working, I am using spark in local mode, master and worker in the same machine. I run the example in spark-shell —package com.databricks:spark-csv_2.10:1.3.0 without errors. BR From: Gourav Sengupta Date: Monday,

Re: caching ratigs with ALS implicit

2016-02-15 Thread Sean Owen
It will need its intermediate RDDs to be cached, and it will do that internally. See the setIntermediateRDDStorageLevel method. Skim the API docs too. On Mon, Feb 15, 2016 at 9:21 PM, Roberto Pagliari wrote: > Something not clear from the documentation is weather the

Re: recommendations with duplicate ratings

2016-02-15 Thread Sean Owen
You're asking what happens when you put many ratings for one user-item pair in the input, right? I'm saying you shouldn't do that -- aggregate them into one pair in your application. For rating-like (explicit) data, it doesn't really make sense otherwise. The only sensible aggregation is

caching ratigs with ALS implicit

2016-02-15 Thread Roberto Pagliari
Something not clear from the documentation is weather the ratings RDD needs to be cached before calling ALS trainImplicit. Would there be any performance gain?

Re: recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
Hi Sean, I¹m not sure what you mean by aggregate. The input of trainImplicit is an RDD of Ratings. I find it odd that duplicate ratings would mess with ALS in the implicit case. It¹d be nice if it didn¹t. Thank you, On 15/02/2016 20:49, "Sean Owen" wrote: >I believe you

Re: recommendations with duplicate ratings

2016-02-15 Thread Sean Owen
I believe you need to aggregate inputs per user-item in your call. I am actually not sure what happens if you don't. I think it would compute the factors twice and one would win, so yes I think it would effectively be ignored. For implicit, that wouldn't work correctly, so you do need to

recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
What happens when duplicate user/ratings are fed into ALS (the implicit version, specifically)? Are duplicates ignored? I'm asking because that would save me a distinct. Thank you,

Re: temporary tables created by registerTempTable()

2016-02-15 Thread Mich Talebzadeh
Hi Michael, A temporary table in Hive is private to the session that created that table itself within the lifetime of that session. The table is created in the same database (in this case oraclehadoop.db) "first" and then moved to /tmp directory in hdfs in _tmp_space_db directory

How to partition a dataframe based on an Id?

2016-02-15 Thread SRK
Hi, How to partition a dataframe of User Objects based on an Id so that I can both do a join on an Id and also retrieve all the user objects in between a time period when queried? Thanks! -- View this message in context:

Is predicate push-down supported by default in dataframes?

2016-02-15 Thread SRK
Hi, Is predicate push down supported by default in dataframes or is it dependent on the format in which the dataframes is stored like Parquet? Thanks, Swetha -- View this message in context:

[ANNOUNCE] Apache SystemML 0.9.0-incubating released

2016-02-15 Thread Luciano Resende
The Apache SystemML team is pleased to announce the release of Apache SystemML version 0.9.0-incubating. This is the first release as an Apache project. Apache SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic

Re: How to join an RDD with a hive table?

2016-02-15 Thread swetha kasireddy
How about saving the dataframe as a table partitioned by userId? My User records have userId, number of sessions, visit count etc as the columns and it should be partitioned by userId. I will need to join the userTable saved in the database as follows with an incoming session RDD. The session RDD

Re: Check if column exists in Schema

2016-02-15 Thread Sebastian Piu
I just realised this is a bit vague, I'm looking to create a function that looks into different columns to get a value. So depending on a type I might look into a given path or another (which might or might not exist). Example if column *some.path.to.my.date *exists I'd return that, if it doesn't

Check if column exists in Schema

2016-02-15 Thread Sebastian Piu
Is there any way of checking if a given column exists in a Dataframe?

Re: temporary tables created by registerTempTable()

2016-02-15 Thread Michael Segel
I was just looking at that… Out of curiosity… if you make it a Hive Temp Table… who has access to the data? Just your app, or anyone with access to the same database? (Would you be able to share data across different JVMs? ) (E.G - I have a reader who reads from source A that needs to

Re: Stateful Operation on JavaPairDStream Help Needed !!

2016-02-15 Thread Abhishek Anand
I am now trying to use mapWithState in the following way using some example codes. But, by looking at the DAG it does not seem to checkpoint the state and when restarting the application from checkpoint, it re-partitions all the previous batches data from kafka. static Function3

Re: How to join an RDD with a hive table?

2016-02-15 Thread swetha kasireddy
OK. would it only query for the records that I want in hive as per filter or just load the entire table? My user table will have millions of records and I do not want to cause OOM errors by loading the entire table in memory. On Mon, Feb 15, 2016 at 12:51 AM, Mich Talebzadeh

Subscribe

2016-02-15 Thread Jayesh Thakrar
Subscribe

SparkListener onApplicationEnd processing an RDD throws exception because of stopped SparkContext

2016-02-15 Thread Sumona Routh
Hi there, I am trying to implement a listener that performs as a post-processor which stores data about what was processed or erred. With this, I use an RDD that may or may not change during the course of the application. My thought was to use onApplicationEnd and then saveToCassandra call to

Memory problems and missing heartbeats

2016-02-15 Thread JOAQUIN GUANTER GONZALBEZ
Hello, I am facing in my Project two different issues with Spark that are driving me crazy. I am currently running in EMR (Spark 1.5.2 + YARN), using the "--executor-memory 40G" option. Problem #1 = Some of my processes get killed by YARN because the container is exceeding the

More than one StateSpec in the same application

2016-02-15 Thread Udo Fholl
Hi all, Does StateSpec have their own state or the state is per stream, thus all StateSpec over the same stream will share the state? Thanks. Best regards, Udo.

Re: temporary tables created by registerTempTable()

2016-02-15 Thread Mich Talebzadeh
> Hi, > > It is my understanding that the registered temporary tables created by > registerTempTable() used in Spark shell built on ORC files? > > For example the following Data Frame just creates a logical abstraction > > scala> var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID,

Re: Passing multiple jar files to spark-shell

2016-02-15 Thread Mich Talebzadeh
Thanks Ted. I will have a look. regards On 15/02/2016 14:34, Ted Yu wrote: > Mich: > You can pass jars for driver using: > > spark.driver.extraClassPath > > Cheers > > On Mon, Feb 15, 2016 at 1:05 AM, Mich Talebzadeh wrote: > > Thanks Deng. Unfortunately it

Re: Passing multiple jar files to spark-shell

2016-02-15 Thread Ted Yu
Mich: You can pass jars for driver using: spark.driver.extraClassPath Cheers On Mon, Feb 15, 2016 at 1:05 AM, Mich Talebzadeh wrote: > Thanks Deng. Unfortunately it seems that it looks for driver-class-path as > well L > > > > For example with –jars alone I get > > > >

Re: Spark DataFrameNaFunctions unrecognized

2016-02-15 Thread Ted Yu
fill() was introduced in 1.3.1 Can you show code snippet which reproduces the error ? I tried the following using spark-shell on master branch: scala> df.na.fill(0) res0: org.apache.spark.sql.DataFrame = [col: int] Cheers On Mon, Feb 15, 2016 at 3:36 AM, satish chandra j

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner
2016-02-15 14:02 GMT+01:00 Sun, Rui : > On computation, RRDD launches one R process for each partition, so there > won't be thread-safe issue > > Could you give more details on your new environment? Running on EC2, I start the executors via /usr/bin/R CMD javareconf -e

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi Jorge/ All, Please please please go through this link http://spark.apache.org/docs/latest/spark-standalone.html. The link tells you how to start a SPARK cluster in local mode. If you have not started or worked in SPARK cluster in

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
Hi Gourav, I did not unterstand your problem… the - - packages command should not make any difference if you are running standalone or in YARN for example. Give us an example what packages are you trying to load, and what error are you getting… If you want to use the libraries in

RE: Running synchronized JRI code

2016-02-15 Thread Sun, Rui
On computation, RRDD launches one R process for each partition, so there won't be thread-safe issue Could you give more details on your new environment? -Original Message- From: Simon Hafner [mailto:reactorm...@gmail.com] Sent: Monday, February 15, 2016 7:31 PM To: Sun, Rui

Re: Single context Spark from Python and Scala

2016-02-15 Thread Chandeep Singh
You could consider using Zeppelin - https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html https://zeppelin.incubator.apache.org/ ZeppelinContext Zeppelin

Single context Spark from Python and Scala

2016-02-15 Thread Leonid Blokhin
Hello I want to work with single context Spark from Python and Scala. Is it possible? Is it possible to do betwen started ./bin/pyspark and ./bin/spark-shell for dramatic example? Cheers, Leonid

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, I am grateful for everyone's response, but sadly no one here actually has read the question before responding. Has anyone yet tried starting a SPARK cluster as mentioned in the link in my email? :) Regards, Gourav On Mon, Feb 15, 2016 at 11:16 AM, Jorge Machado wrote: >

Spark DataFrameNaFunctions unrecognized

2016-02-15 Thread satish chandra j
Hi All, Currently I am using Spark 1.4.0 version, getting error when trying to use "fill" function which is one among DataFrameNaFunctions Snippet: df.na.fill(col: ) Error: value na is not a member of org.apache.spark.sql.DataFrame As I need null values in column "col" of DataFrame "df" to

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner
2016-02-15 4:35 GMT+01:00 Sun, Rui : > Yes, JRI loads an R dynamic library into the executor JVM, which faces > thread-safe issue when there are multiple task threads within the executor. > > I am thinking if the demand like yours (calling R code in RDD > transformations) is

Re: How to add kafka streaming jars when initialising the sparkcontext in python

2016-02-15 Thread Jorge Machado
Hi David, Just package with maven and deploy everthing into one jar. You don´t need to do it like this… Use Maven for example. And check if your cluster already has this libraries loaded. If you are using CDH for example you can just import the classes because they already are in the path

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Jorge Machado
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 It will download everything for you and register into your JVM. If you want to use it in your Prod just package it with maven. > On 15/02/2016, at 12:14, Gourav Sengupta wrote: > > Hi, >

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, How to we include the following package: https://github.com/databricks/spark-csv while starting a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html Thanks and Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Ramanathan R
Hi Gourav, If your question is how to distribute python package dependencies across the Spark cluster programmatically? ...here is an example - $ export PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application' And in code:

New line lost in streaming output file

2016-02-15 Thread Ashutosh Kumar
I am getting multiple empty files for streaming output for each interval. To Avoid this I tried kStream.foreachRDD(new VoidFunction2(){ *public void call(JavaRDD rdd,Time time) throws Exception { if(!rdd.isEmpty()){

Re: Using SPARK packages in Spark Cluster

2016-02-15 Thread Gourav Sengupta
Hi, So far no one is able to get my question at all. I know what it takes to load packages via SPARK shell or SPARK submit. How do I load packages when starting a SPARK cluster, as mentioned here http://spark.apache.org/docs/latest/spark-standalone.html ? Regards, Gourav Sengupta On Mon,

How to add kafka streaming jars when initialising the sparkcontext in python

2016-02-15 Thread David Kennedy
I have no problems when submitting the task using spark-submit. The --jars option with the list of jars required is successful and I see in the output the jars being added: 16/02/10 11:14:24 INFO spark.SparkContext: Added JAR file:/usr/lib/spark/extras/lib/spark-streaming-kafka.jar at

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Chandeep Singh
You could also fire up a VNC session and access all internal pages from there. > On Feb 15, 2016, at 9:19 AM, Divya Gehlot wrote: > > Hi Sabarish, > Thanks alot for your help. > I am able to view the logs now > > Thank you very much . > > Cheers, > Divya > > > On

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi Sabarish, Thanks alot for your help. I am able to view the logs now Thank you very much . Cheers, Divya On 15 February 2016 at 16:51, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > You can setup SSH tunneling. > > >

RE: Passing multiple jar files to spark-shell

2016-02-15 Thread Mich Talebzadeh
Thanks Deng. Unfortunately it seems that it looks for driver-class-path as well :( For example with –jars alone I get spark-shell --master spark://50.140.197.217:7077 --jars /home/hduser/jars/ojdbc6.jar,/home/hduser/jars/jconn4.jar s: org.apache.spark.sql.DataFrame = [AMOUNT_SOLD:

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Akhil Das
According to the documentation , the hostname that you are seeing for those properties are inherited from *yarn.nodemanager.hostname *if your requirement is just to see the logs, then you can ssh-tunnel to the

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Sabarish Sasidharan
You can setup SSH tunneling. http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html Regards Sab On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot wrote: > Hi, > I have hadoop cluster set up in EC2. > I am unable to view application logs in

RE: How to join an RDD with a hive table?

2016-02-15 Thread Mich Talebzadeh
Also worthwhile using temporary tables for the joint query. I can join a Hive table with any other JDBC accessed table from any other databases with DF and temporary tables // //Get the FACT table from Hive // var s = HiveContext.sql("SELECT AMOUNT_SOLD, TIME_ID, CHANNEL_ID FROM

Re: How to join an RDD with a hive table?

2016-02-15 Thread Ted Yu
Have you tried creating a DataFrame from the RDD and join with DataFrame which corresponds to the hive table ? On Sun, Feb 14, 2016 at 9:53 PM, SRK wrote: > Hi, > > How to join an RDD with a hive table and retrieve only the records that I > am > interested. Suppose, I

Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Akhil Das
You can set *yarn.nodemanager.webapp.address* in the yarn-site.xml/yarn-default.xml file to change it I guess. Thanks Best Regards On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot wrote: > Hi, > I have hadoop cluster set up in EC2. > I am unable to view application logs

Re: Best way to bring up Spark with Cassandra (and Elasticsearch) in production.

2016-02-15 Thread Ted Yu
Sounds reasonable. Please consider posting question on Spark C* connector on their mailing list if you have any. On Sun, Feb 14, 2016 at 7:51 PM, Kevin Burton wrote: > Afternoon. > > About 6 months ago I tried (and failed) to get Spark and Cassandra working > together in

Re: Unable to insert overwrite table with Spark 1.5.2

2016-02-15 Thread Ted Yu
Do you mind trying Spark 1.6.0 ? As far as I can tell, 'Cannot overwrite table' exception may only occur for CreateTableUsingAsSelect when source and dest relations refer to the same table in branch-1.6 Cheers On Sun, Feb 14, 2016 at 9:29 PM, Ramanathan R wrote: > Hi

Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi, I have hadoop cluster set up in EC2. I am unable to view application logs in Web UI as its taking internal IP Like below : http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042 How can I change this to external one or

Re: Scala types to StructType

2016-02-15 Thread Ted Yu
Please the last line of convertToCatalyst(a: Any) : case other => other FYI On Mon, Feb 15, 2016 at 12:09 AM, Fabian Böhnlein < fabian.boehnl...@gmail.com> wrote: > Interesting, thanks. > > The (only) publicly accessible method seems *convertToCatalyst*: > >

Re: Scala types to StructType

2016-02-15 Thread Fabian Böhnlein
Interesting, thanks. The (only) publicly accessible method seems /convertToCatalyst/: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L425 Seems it's missing some types like Integer, Short, Long... I'll give it

Re: mllib:Survival Analysis : assertion failed: AFTAggregator loss sum is infinity. Error for unknown reason.

2016-02-15 Thread Yanbo Liang
Hi Stuti, This is a bug of AFTSurvivalRegression, we did not handle "lossSum == infinity" properly. I have open https://issues.apache.org/jira/browse/SPARK-13322 to track this issue and will send a PR. Thanks for reporting this issue. Yanbo 2016-02-12 15:03 GMT+08:00 Stuti Awasthi