Re: Isotonic Regression, run method overloaded Error

2016-07-10 Thread Yanbo Liang
Hi Swaroop, Would you mind to share your code that others can help you to figure out what caused this error? I can run the isotonic regression examples well. Thanks Yanbo 2016-07-08 13:38 GMT-07:00 dsp : > Hi I am trying to perform Isotonic Regression on a data set with

Re: mllib based on dataset or dataframe

2016-07-10 Thread Yanbo Liang
DataFrame is a kind of special case of Dataset, so they mean the same thing. Actually the ML pipeline API will accept Dataset[_] instead of DataFrame in Spark 2.0. We can say that MLlib will focus on the Dataset-based API for futher development more accurately. Thanks Yanbo 2016-07-10 20:35

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
Hi Ayan, I tested It works fine but one more confuse is If my (technical) users want to write some code in zeppelin to apply thing into Hive table? Zeppelin and STS can’t share Spark Context that mean we need separated process? Is there anyway to use the same Spark Context of STS? Regards,

Problem connecting Zeppelin 0.6 to Spark Thrift Server

2016-07-10 Thread Mich Talebzadeh
Hi, I can use JDBC connection to connect from Squirrel client to Spark thrift server and this works fine. I have Zeppelin 0.6.o that works OK with the default spark interpreter. I configured JDBC interpreter to connect to Spark thrift server as follows [image: Inline images 1] I can use

Re: StreamingKmeans Spark doesn't work at all

2016-07-10 Thread Shuai Lin
I would suggest you run the scala version of the example first, so you can tell whether it's a problem of the data you provided or a problem of the java code. On Mon, Jul 11, 2016 at 2:37 AM, Biplob Biswas wrote: > Hi, > > I know i am asking again, but I tried running

Re: KEYS file?

2016-07-10 Thread Shuai Lin
> > at least links to the keys used to sign releases on the > download page +1 for that. On Mon, Jul 11, 2016 at 3:35 AM, Phil Steitz wrote: > On 7/10/16 10:57 AM, Shuai Lin wrote: > > Not sure where you see " 0x7C6C105FFC8ED089". I > > That's the key ID for the key

Spark logging

2016-07-10 Thread SamyaMaiti
Hi Team, I have a spark application up & running on a 10 node Standalone cluster. When i launch the application in cluster mode i am able to create separate log file for driver & executors (common for all executors). But, my requirement is to create separate log file for each executors. Is it

mllib based on dataset or dataframe

2016-07-10 Thread jinhong lu
Hi, Since the DataSet will be the major API in spark2.0, why mllib will DataFrame-based, and 'future development will focus on the DataFrame-based API.’ Any plan will change mllib form DataFrame-based to DataSet-based? = Thanks, lujinhong

Re: Spark crashes with two parquet files

2016-07-10 Thread Takeshi Yamamuro
The log explicitly said "java.lang.OutOfMemoryError: Java heap space", so you need to allocate more JVM memory for spark? // maropu On Mon, Jul 11, 2016 at 11:59 AM, Javier Rey wrote: > Also the problem appears when I used clause: unionAll > > 2016-07-10 21:58 GMT-05:00

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Takeshi Yamamuro
Hi, ISTM multiple sparkcontexts are not recommended in spark. See: https://issues.apache.org/jira/browse/SPARK-2243 // maropu On Mon, Jul 11, 2016 at 12:01 PM, ayan guha wrote: > Hi > > Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on > YARN for

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
Hi Ayan, It is brilliant idea. Thank you every much. I will try this way. Regards, Chanh > On Jul 11, 2016, at 10:01 AM, ayan guha wrote: > > Hi > > Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on > YARN for few months now without much issue.

Re: "client / server" config

2016-07-10 Thread ayan guha
Yes, that is expected to move on. If it looks it is waiting for something, my first instinct would be to check network connectivity such as your cluster must have access back to your Mac to read the file (it is probably waiting to time out) On Mon, Jul 11, 2016 at 12:59 PM, Jean Georges Perrin

Re: How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread ayan guha
Hi Can you try using JDBC interpreter with STS? We are using Zeppelin+STS on YARN for few months now without much issue. On Mon, Jul 11, 2016 at 12:48 PM, Chanh Le wrote: > Hi everybody, > We are using Spark to query big data and currently we’re using Zeppelin to > provide

Re: "client / server" config

2016-07-10 Thread Jean Georges Perrin
Good for the file :) No it goes on... Like if it was waiting for something jg > On Jul 10, 2016, at 22:55, ayan guha wrote: > > Is this terminating the execution or spark application still runs after this > error? > > One thing for sure, it is looking for local file on

Re: "client / server" config

2016-07-10 Thread ayan guha
Is this terminating the execution or spark application still runs after this error? One thing for sure, it is looking for local file on driver (ie your mac) @ location: file:/Users/jgp/Documents/Data/restaurants-data.json On Mon, Jul 11, 2016 at 12:33 PM, Jean Georges Perrin

How to run Zeppelin and Spark Thrift Server Together

2016-07-10 Thread Chanh Le
Hi everybody, We are using Spark to query big data and currently we’re using Zeppelin to provide a UI for technical users. Now we also need to provide a UI for business users so we use Oracle BI tools and set up a Spark Thrift Server (STS) for it. When I run both Zeppelin and STS throw error:

Re: Spark crashes with two parquet files

2016-07-10 Thread Takeshi Yamamuro
Hi, What's the schema in the parquets? Also, could you show us the stack trace when the error happens? // maropu On Mon, Jul 11, 2016 at 11:42 AM, Javier Rey wrote: > Hi everybody, > > I installed Spark 1.6.1, I have two parquet files, but when I need show > registers using

Spark crashes with two parquet files

2016-07-10 Thread Javier Rey
Hi everybody, I installed Spark 1.6.1, I have two parquet files, but when I need show registers using unionAll, Spark crash I don't understand what happens. But when I use show() only one parquet file this is work correctly. code with fault: path = '/data/train_parquet/' train_df =

"client / server" config

2016-07-10 Thread Jean Georges Perrin
I have my dev environment on my Mac. I have a dev Spark server on a freshly installed physical Ubuntu box. I had some connection issues, but it is now all fine. In my code, running on the Mac, I have: 1 SparkConf conf = new

Re: Network issue on deployment

2016-07-10 Thread Jean Georges Perrin
It appears like i had issues in my /etc/hosts... it seems ok now > On Jul 10, 2016, at 2:13 PM, Jean Georges Perrin wrote: > > I tested that: > > I set: > > _JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true > SPARK_LOCAL_IP=10.0.100.120 > I still have the warning in the log: > >

Re: IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread Takeshi Yamamuro
Hi, One of solutions to use `spark-csv` (See: https://github.com/databricks/spark-csv#features). To load NULL, you can use `nullValue` there. // maropu On Mon, Jul 11, 2016 at 1:14 AM, Radha krishna wrote: > I want to apply null comparison to a column in sqlcontext.sql,

Re: KEYS file?

2016-07-10 Thread Phil Steitz
On 7/10/16 10:57 AM, Shuai Lin wrote: > Not sure where you see " 0x7C6C105FFC8ED089". I That's the key ID for the key below. > think the release is signed with the > key https://people.apache.org/keys/committer/pwendell.asc . Thanks! That key matches. The project should publish a KEYS file [1]

Re: StreamingKmeans Spark doesn't work at all

2016-07-10 Thread Biplob Biswas
Hi, I know i am asking again, but I tried running the same thing on mac as well as some answers on the internet suggested it could be an issue with the windows environment, but still nothing works. Can anyone atleast suggest whether its a bug with spark or is it something else? Would be really

Re: Network issue on deployment

2016-07-10 Thread Jean Georges Perrin
I tested that: I set: _JAVA_OPTIONS=-Djava.net.preferIPv4Stack=true SPARK_LOCAL_IP=10.0.100.120 I still have the warning in the log: 16/07/10 14:10:13 WARN Utils: Your hostname, micha resolves to a loopback address: 127.0.1.1; using 10.0.100.120 instead (on interface eno1) 16/07/10 14:10:13

Re: KEYS file?

2016-07-10 Thread Shuai Lin
Not sure where you see " 0x7C6C105FFC8ED089". I think the release is signed with the key https://people.apache.org/keys/committer/pwendell.asc . I think this tutorial can be helpful: http://www.apache.org/info/verification.html On Mon, Jul 11, 2016 at 12:57 AM, Phil Steitz

Network issue on deployment

2016-07-10 Thread Jean Georges Perrin
Hi, So far I have been using Spark "embedded" in my app. Now, I'd like to run it on a dedicated server. I am that far: - fresh ubuntu 16, server name is mocha / ip 10.0.100.120, installed scala 2.10, installed Spark 1.6.2, recompiled - Pi test works - UI on port 8080 works Log says: Spark

KEYS file?

2016-07-10 Thread Phil Steitz
I can't seem to find a link the the Spark KEYS file. I am trying to validate the sigs on the 1.6.2 release artifacts and I need to import 0x7C6C105FFC8ED089. Is there a KEYS file available for download somewhere? Apologies if I am just missing an obvious link. Phil

Re: IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread Radha krishna
I want to apply null comparison to a column in sqlcontext.sql, is there any way to achieve this? On Jul 10, 2016 8:55 PM, "Radha krishna" wrote: > Ok thank you, how to achieve the requirement. > > On Sun, Jul 10, 2016 at 8:44 PM, Sean Owen wrote: > >> It

How to Register Permanent User-Defined-Functions (UDFs) in SparkSQL

2016-07-10 Thread Lokesh Yadav
Hi with sqlContext we can register a UDF like this: sqlContext.udf.register("sample_fn", sample_fn _ ) But this UDF is limited to that particular sqlContext only. I wish to make the registration persistent, so that I can access the same UDF in any subsequent sqlcontext. Or is there any other way

Re: IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread Radha krishna
Ok thank you, how to achieve the requirement. On Sun, Jul 10, 2016 at 8:44 PM, Sean Owen wrote: > It doesn't look like you have a NULL field, You have a string-value > field with an empty string. > > On Sun, Jul 10, 2016 at 3:19 PM, Radha krishna wrote:

Re: IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread Sean Owen
It doesn't look like you have a NULL field, You have a string-value field with an empty string. On Sun, Jul 10, 2016 at 3:19 PM, Radha krishna wrote: > Hi All,IS NOT NULL is not working in programmatic sql. check below for input > output and code. > > Input > > 10,IN >

IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread radha
Hi All,IS NOT NULL is not working in programmatic sql. check below for input output and code. Input 10,IN 11,PK 12,US 13,UK 14,US 15,IN 16, 17,AS 18,AS 19,IR 20,As val cntdat = sc.textFile("/user/poc_hortonworks/radha/gsd/sample.txt"); case class CNT (id:Int , code : String) val cntdf =

IS NOT NULL is not working in programmatic SQL in spark

2016-07-10 Thread Radha krishna
Hi All,IS NOT NULL is not working in programmatic sql. check below for input output and code. Input 10,IN 11,PK 12,US 13,UK 14,US 15,IN 16, 17,AS 18,AS 19,IR 20,As val cntdat = sc.textFile("/user/poc_hortonworks/radha/gsd/sample.txt"); case class CNT (id:Int , code : String) val cntdf =

location of a partition in the cluster/ how parallelize method distribute the RDD partitions over the cluster.

2016-07-10 Thread Mazen
Hi, Any hint about getting the location of a particular RDD partition on the cluster? a workaround? Parallelize method on RDDs partitions the RDD into splits as specified or per as per the default parallelism configuration. Does parallelize actually distribute the partitions into the

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-10 Thread Lars Albertsson
Let us assume that you want to build an integration test setup where you run all participating components in Docker. You create a docker-compose.yml with four Docker images, something like this: # Start docker-compose.yml version: '2' services: myapp: build: myapp_dir links: -