Re: How to convert dataframe to a nested StructType schema

2015-09-15 Thread Terry Hole
Hao, For spark 1.4.1, you can try this: val rowrdd = df.rdd.map(r => Row(Row(r(3)), Row(r(0), r(1), r(2 val newDF = sqlContext.createDataFrame(rowrdd, yourNewSchema) Thanks! - Terry On Wed, Sep 16, 2015 at 2:10 AM, Hao Wang wrote: > Hi, > > I created a dataframe

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-09 Thread Terry Hole
his metadata when you > construct the input data. > > On Sun, Sep 6, 2015 at 10:41 AM, Terry Hole <hujie.ea...@gmail.com> wrote: > > Sean > > > > Do you know how to tell decision tree that the "label" is a binary or set > > some attributes to datafra

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-07 Thread Terry Hole
Xiangrui, Do you have any idea how to make this work? Thanks - Terry Terry Hole <hujie.ea...@gmail.com>于2015年9月6日星期日 17:41写道: > Sean > > Do you know how to tell decision tree that the "label" is a binary or set > some attributes to dataframe to carry number of cl

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
the line you've not specified your label > column -- it's defaulting to "label" and it does not recognize it, or > at least not as a binary or nominal attribute. > > On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole <hujie.ea...@gmail.com> wrote: > > Hi, Experts,

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
ot a binary or nominal attribute > though. I think that's the missing step. A double-valued column need > not be one of these attribute types. > > On Sun, Sep 6, 2015 at 10:14 AM, Terry Hole <hujie.ea...@gmail.com> wrote: > > Hi, Owen, > > > > The dataframe &qu

Re: Job aborted due to stage failure: java.lang.StringIndexOutOfBoundsException: String index out of range: 18

2015-08-28 Thread Terry Hole
Ricky, You may need to use map instead of flatMap in your case *val rowRDD=sc.textFile(/user/spark/short_model).map(_.split(\\t)).map(p = Row(...))* Thanks! -Terry On Fri, Aug 28, 2015 at 5:08 PM, our...@cnsuning.com our...@cnsuning.com wrote: hi all, when using spark sql ,A problem

Re: standalone to connect mysql

2015-07-21 Thread Terry Hole
that? Best regards, Jack *From:* Terry Hole [mailto:hujie.ea...@gmail.com] *Sent:* Tuesday, 21 July 2015 4:17 PM *To:* Jack Yang; user@spark.apache.org *Subject:* Re: standalone to connect mysql Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars /home/lib/mysql-connector

Re: standalone to connect mysql

2015-07-21 Thread Terry Hole
Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry Hi there, I would like to use spark to access the data in mysql. So firstly I tried to run the program using:

Re: [Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
. Thanks! - Terry Ted Yu yuzhih...@gmail.com于2015年7月17日周五 下午12:02写道: See this recent thread: http://search-hadoop.com/m/q3RTtFW7iMDkrj61/Spark+shell+oom+subj=java+lang+OutOfMemoryError+PermGen+space On Jul 16, 2015, at 8:51 PM, Terry Hole hujie.ea...@gmail.com wrote: Hi, Background

[Spark Shell] Could the spark shell be reset to the original status?

2015-07-16 Thread Terry Hole
Hi, Background: The spark shell will get out of memory error after dealing lots of spark work. Is there any method which can reset the spark shell to the startup status? I tried *:reset*, but it seems not working: i can not create spark context anymore (some compile error as below) after the

Re: fileStream with old files

2015-07-15 Thread Terry Hole
); } Files.deleteIfExists(testDir); } *From:* Tathagata Das [mailto:t...@databricks.com] *Sent:* Wednesday, July 15, 2015 00:01 *To:* Terry Hole *Cc:* Hunter Morgan; user@spark.apache.org *Subject:* Re: fileStream with old files It was added, but its not documented publicly. I

Re: fileStream with old files

2015-07-13 Thread Terry Hole
A new configuration named *spark.streaming.minRememberDuration* was added since 1.2.1 to control the file stream input, the default value is *60 seconds*, you can change this value to a large value to include older files (older than 1 minute) You can get the detail from this jira:

Re: [Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-10 Thread Terry Hole
Michael, Thanks - Terry Michael Armbrust mich...@databricks.com于2015年7月11日星期六 04:02写道: Metastore configuration should be set in hive-site.xml. On Thu, Jul 9, 2015 at 8:59 PM, Terry Hole hujie.ea...@gmail.com wrote: Hi, I am trying to set the hive metadata destination to a mysql database

[Spark Hive SQL] Set the hive connection in hive context is broken in spark 1.4.1-rc1?

2015-07-09 Thread Terry Hole
Hi, I am trying to set the hive metadata destination to a mysql database in hive context, it works fine in spark 1.3.1, but it seems broken in spark 1.4.1-rc1, where it always connect to the default metadata: local), is this a regression or we must set the connection in hive-site.xml? The code

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-09 Thread Terry Hole
and creating a new one? Thanks Best Regards On Wed, Jul 8, 2015 at 8:12 PM, Terry Hole hujie.ea...@gmail.com wrote: I am using spark 1.4.1rc1 with default hive settings Thanks - Terry Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same

Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same location, so i want to close the derby connection previous created in the spark shell, is there any way to do this? I try this, but it does not work:

Re: Is there a way to shutdown the derby in hive context in spark shell?

2015-07-08 Thread Terry Hole
I am using spark 1.4.1rc1 with default hive settings Thanks - Terry Hi All, I'd like to use the hive context in spark shell, i need to recreate the hive meta database in the same location, so i want to close the derby connection previous created in the spark shell, is there any way to do this?

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread Terry Hole
Found this a bug in spark 1.4.0: SPARK-8368 https://issues.apache.org/jira/browse/SPARK-8368 Thanks! Terry On Thu, Jul 2, 2015 at 1:20 PM, Terry Hole hujie.ea...@gmail.com wrote: All, I am using spark console 1.4.0 to do some tests, when a create a newly HiveContext (Line 18 in the code

Meets class not found error in spark console with newly hive context

2015-07-01 Thread Terry Hole
All, I am using spark console 1.4.0 to do some tests, when a create a newly HiveContext (Line 18 in the code) in my test function, it always throw exception like below (It works in spark console 1.3.0), but if i removed the HiveContext (The line 18 in the code) in my function, it works fine. Any

Re: Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-11 Thread Terry Hole
,Whatever), underneath i think spark won't ship properties which don't start with spark.* to the executors. Thanks Best Regards On Mon, May 11, 2015 at 8:33 AM, Terry Hole hujie.ea...@gmail.com wrote: Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = [kamon.system.SystemMetrics, kamon.statsd.StatsD] } But i can not find a way to do this, i have tried these: 1.

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-10 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = [kamon.system.SystemMetrics, kamon.statsd.StatsD] } But i can not find a way to do this, i have tried these: 1.

Is it possible to set the akka specify properties (akka.extensions) in spark

2015-05-07 Thread Terry Hole
Hi all, I'd like to monitor the akka using kamon, which need to set the akka.extension to a list like this in typesafe config format: akka { extensions = [kamon.system.SystemMetrics, kamon.statsd.StatsD] } But i can not find a way to do this, i have tried these: 1.

Re: spark 1.3.0 strange log message

2015-04-23 Thread Terry Hole
Use this in spark conf: spark.ui.showConsoleProgress=false Best Regards, On Fri, Apr 24, 2015 at 11:23 AM, Henry Hung ythu...@winbond.com wrote: Dear All, When using spark 1.3.0 spark-submit with directing out and err to a log file, I saw some strange lines inside that looks like this:

Fwd: [Spark Streaming] The FileInputDStream newFilesOnly=false does not work in 1.2 since

2015-01-20 Thread Terry Hole
Hi, I am trying to move from 1.1 to 1.2 and found that the newFilesOnly=false (Intend to include old files) does not work anymore. It works great in 1.1, this should be introduced by the last change of this class. Does this flag behavior change or is it a regression? Issue should be caused by