how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier

2016-11-27 Thread Zhiliang Zhu
Hi All, I need to print auc and prc for GBTClassifier model, it seems okay for RandomForestClassifier but not GBTClassifier, though rawPrediction column is neither in original data. the codes are : ..    // Set up Pipeline    val stages = new

Re: how to see Pipeline model information

2016-11-27 Thread Zhiliang Zhu
imator(pipeline).setEvaluator(new  RegressionEvaluator).setEstimatorParamMaps(paramGrid)    val cvModel =  cv.fit(data)    val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel]     val lrModel = plmodel.stages(0).asInstanceOf[LinearRegressionModel] On 24 November 2016 at 10:23, Zhiliang Zhu &

Re: get specific tree or forest structure from pipeline model

2016-11-24 Thread Zhiliang Zhu
scala codes are also for me, if there is some solution . On Friday, November 25, 2016 1:27 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi All, Here want to print the specific tree or forest structure from pipeline model.  However, it seems that here met more issue

get specific tree or forest structure from pipeline model

2016-11-24 Thread Zhiliang Zhu
Hi All, Here want to print the specific tree or forest structure from pipeline model.  However, it seems that here met more issue about XXXClassifier and XXXClassificationModel, as the codes below: ...        GBTClassifier gbtModel = new GBTClassifier();        ParamMap[] grid = new

Re: how to see Pipeline model information

2016-11-24 Thread Zhiliang Zhu
, November 24, 2016 2:15 AM, Xiaomeng Wan <shawn...@gmail.com> wrote: You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages. Shawn On 23 November 2016 at 10:21, Zhiliang Zhu <zchl.j...@yahoo.com.inval

how to see Pipeline model information

2016-11-23 Thread Zhiliang Zhu
Dear All, I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage. If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as rfModel.toDebugString() and rfModel.toString() . However, while it

spark ml : auc on extreme distributed data

2016-08-14 Thread Zhiliang Zhu
Hi All,  Here I have lot of data with around 1,000,000 rows, 97% of them are negative class and 3% of them are positive class .  I applied Random Forest algorithm to build the model and predict the testing data. For the data preparation,i. firstly randomly split all the data as training data

Re: the spark job is so slow - almost frozen

2016-07-20 Thread Zhiliang Zhu
static dataset small enough to work, and editing the query, then retesting, repeatedly until you cut the execution time by a significant fraction- Using the Spark UI or spark shell to check the skew and make sure partitions are evenly distributed On Jul 18, 2016, at 3:33 AM, Zhiliang Zhu &

the spark job is so slow during shuffle - almost frozen

2016-07-18 Thread Zhiliang Zhu
or clue is also good.  Thanks in advance~ On Tuesday, July 19, 2016 11:05 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Show original message Hi Mungeol, Thanks a lot for your help. I will try that. On Tuesday, July 19, 2016 9:21 AM, Mungeol Heo <mungeol@gmail.com> wr

Re: Spark driver getting out of memory

2016-07-18 Thread Zhiliang Zhu
try to set --drive-memory xg , x would be as large as can be set .  On Monday, July 18, 2016 6:31 PM, Saurav Sinha wrote: Hi, I am running spark job. Master memory - 5Gexecutor memort 10G(running on 4 node) My job is getting killed as no of partition increase

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
ecause It's complex you can use something like EXPLAIN command to show what going on.   On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: the sql logic in the program is very much complex , so do not describe the detailed codes   here .  On Monday, Jul

Re: the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
the sql logic in the program is very much complex , so do not describe the detailed codes   here .  On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi All,   Here we have one application, it needs to extract different columns from 6 hive

the spark job is so slow - almost frozen

2016-07-18 Thread Zhiliang Zhu
Hi All,   Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy calculation, there is around 100,000 number of rows in each table,finally need to output another table or file (with format of consistent columns) .  However, after lots of

Re: spark job automatically killed without rhyme or reason

2016-06-23 Thread Zhiliang Zhu
ory for example, in particular spark.yarn.executor.memoryOverhead Everything else you mention is a symptom of YARN shutting down your jobs because your memory settings don't match what your app does. They're not problems per se, based on what you have provided. On Mon, Jun 20, 2016 at 9:17 AM, Zhili

Re: spark job automatically killed without rhyme or reason

2016-06-20 Thread Zhiliang Zhu
huffle operation?   --WBR, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 14:10 To: User; kp...@hotmail.com Subject: Re: spark job automatically killed without rhyme or reason   Show original message Hi Alexander, is your yarn userlog   just for the executor log ? as for those logs seem a

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
currently     ... Thank you in advance~   On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
heck yarn userlogs for more information…   --WBR, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 9:36 To: Zhiliang Zhu; User Subject: Re: spark job automatically killed without rhyme or reason   anyone ever met the similar problem, which is quite strange ...  On Friday, June 17, 2016 2:1

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
in advance~   On Friday, June 17, 2016 6:53 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Alexander, Thanks a lot for your reply. Yes, submitted by yarn.Do you just mean in the executor log file by way of yarn logs -applicationId id,  in this file, both in some containers'

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
tasks are executed. In this situation, please check yarn userlogs for more information…   --WBR, Alexander   From: Zhiliang Zhu Sent: 17 июня 2016 г. 9:36 To: Zhiliang Zhu; User Subject: Re: spark job automatically killed without rhyme or reason   anyone ever met the similar problem, whic

Re: spark job automatically killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
anyone ever met the similar problem, which is quite strange ...  On Friday, June 17, 2016 2:13 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit &a

spark job killed without rhyme or reason

2016-06-17 Thread Zhiliang Zhu
Hi All, I have a big job which mainly takes more than one hour to run the whole, however, it is very much unreasonable to exit & finish to run midway (almost 80% of the job finished actually, but not all), without any apparent error or exception log. I submitted the same job for many times, it

Re: test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu
just for test, since it seemed that the user email system was something wrong ago, is okay now. On Friday, June 17, 2016 12:18 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote:

test - what is the wrong while adding one column in the dataframe

2016-06-16 Thread Zhiliang Zhu
On Tuesday, May 17, 2016 10:44 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the

what is the wrong while adding one column in the dataframe

2016-05-16 Thread Zhiliang Zhu
Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the result DataFrame. final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0; //DAYS_30 seems difficult to call

how to add one more column in DataFrame

2016-05-16 Thread Zhiliang Zhu
Hi All, For the given DataFrame created by hive sql, however, then it is required to add one more column based on the existing column, and should also keep the previous columns there for the result DataFrame. final double DAYS_30 = 1000 * 60 * 60 * 24 * 30.0; //DAYS_30 seems difficult to call

copy/mv hdfs file to another directory by spark program

2016-01-04 Thread Zhiliang Zhu
For some file on hdfs, it is necessary to copy/move it to some another specific hdfs  directory, and the directory name would keep unchanged.Just need finish it in spark program, but not hdfs commands.Is there any codes, it seems not to be done by searching spark doc ... Thanks in advance! 

what is the proper number set about --num-executors etc

2015-12-31 Thread Zhiliang Zhu
In order to make job run faster, some parameters would be specified in the command lines, such as --executor-cores , --executor-memory and --num-executors ... However, as tested, it seemed that those numbers would not be reset randomly, or some trouble would be caused for the cluster.What is

Re: rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
based on subset of the rows in rdd0 ?That way you can increase the parallelism. Cheers On Mon, Dec 21, 2015 at 9:40 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Ted, Thanks a lot for your kind reply. I needs to convert this rdd0 into another rdd1, rows of  rdd1 are generated fro

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
cases? If there is no shuffle, you can collapse all these functions into one, right? In the meantime, it is not recommended to collectall data to driver. Thanks. Zhan Zhang On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Dear All, I need to iterator some job / rdd quite a

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
cases? If there is no shuffle, you can collapse all these functions into one, right? In the meantime, it is not recommended to collectall data to driver. Thanks. Zhan Zhang On Dec 21, 2015, at 3:44 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Dear All, I need to iterator some job

rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
Dear All, For some rdd, while there is just one partition, then the operation & arithmetic would only be single, the rdd has lose all the parallelism benefit from spark  system ... Is it exactly like that? Thanks very much in advance!Zhiliang

Re: [Beg for help] spark job with very low efficiency

2015-12-21 Thread Zhiliang Zhu
owever, as tested, it seemed that checkpoint is more costlythan collect ... Hopefully you are using the Kryo serializer already. This would be all right.  From your experience , is Kryo improve efficiency obviously ...  RegardsSab On Mon, Dec 21, 2015 at 5:51 PM, Zhiliang Zhu <zchl.j...@yahoo.co

Re: number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
result depend on last iteration? If so, how do they depend on?I think either you can optimize your implementation, or Spark is not the right one for your specific application. Thanks. Zhan Zhang  On Dec 21, 2015, at 10:43 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: What is differ

number limit of map for spark

2015-12-21 Thread Zhiliang Zhu
Dear All, I need to iterator some job / rdd quite a lot of times, but just lost in the problem of spark only accept to call around 350 number of map before it meets one action Function , besides, dozens of action will obviously increase the run time.Is there any proper way ... As tested, there

Re: rdd only with one partition

2015-12-21 Thread Zhiliang Zhu
false)(implicit ord: Ordering[T] = null) Cheers On Mon, Dec 21, 2015 at 2:47 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Dear All, For some rdd, while there is just one partition, then the operation & arithmetic would only be single, the rdd has lose all the parallelism ben

Re: Inverse of the matrix

2015-12-11 Thread Zhiliang Zhu
use matrix SVD decomposition  and spark has the lib . http://spark.apache.org/docs/latest/mllib-dimensionality-reduction.html#singular-value-decomposition-svd   On Thursday, December 10, 2015 7:33 PM, Arunkumar Pillai wrote: Hi I need to find inverse

what's the way to access the last element from another partition

2015-12-08 Thread Zhiliang Zhu
ious order among the elements, and will it also not work ? Thanks very much in advance!  On Monday, December 7, 2015 11:32 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: On Monday, December 7, 2015 10:37 AM, DB Tsai <dbt...@dbtsai.com> wrote: Only beginning a

is repartition very cost

2015-12-08 Thread Zhiliang Zhu
Hi All, I need to do optimize objective function with some linear constraints by   genetic algorithm. I would like to make as much parallelism for it by spark. repartition / shuffle may be used sometimes in it, however, is repartition API very cost ? Thanks in advance!Zhiliang

Re: is repartition very cost

2015-12-08 Thread Zhiliang Zhu
threading engine).   In general you need to do performance testing to see if a repartition is worth the shuffle time.   A common model is to repartition the data once after ingest to achieve parallelism and avoid shuffles whenever possible later.   From: Zhiliang Zhu [mailto:zchl.j...@yahoo.c

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Hi All, > > I would like to compare any two adjacent elements in one given rdd, just as > the single machine code part: > > int a[N] = {...}; > for (int i=0;

Re: the way to compare any two adjacent elements in one rdd

2015-12-06 Thread Zhiliang Zhu
d  API , or repartition ? Thanks a lot in advance! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Dec 6, 2015 at 6:27 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > > > > > On Saturday, Decemb

Re: the way to compare any two adjacent elements in one rdd

2015-12-05 Thread Zhiliang Zhu
On Saturday, December 5, 2015 3:52 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi DB Tsai, Thanks very much for your kind reply! Sorry that for one more issue, as tested it seems that filter could only return JavaRDD but not any JavaRDD , is it ?Then it is not much convenient

the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
Hi All, I would like to compare any two adjacent elements in one given rdd, just as the single machine code part: int a[N] = {...};for (int i=0; i < N - 1; ++i) {   compareFun(a[i], a[i+1]);}... mapPartitions may work for some situations, however, it could not compare elements in different  

Re: the way to compare any two adjacent elements in one rdd

2015-12-04 Thread Zhiliang Zhu
cerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Fri, Dec 4, 2015 at 10:30 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Hi All, > > I would like to compare any two adjacent elements in one given rdd,

what is algorithm to optimize function with nonlinear constraints

2015-11-19 Thread Zhiliang Zhu
Hi all, I have some optimization problem, I have googled a lot but still did not get the exact algorithm or third-party open package to apply in it. Its type is like this, Objective function: f(x1, x2, ..., xn)   (n >= 100, and f may be linear or non-linear)Constraint functions: x1 + x2 + ... +

Re: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Zhiliang Zhu
  On Thursday, November 19, 2015 1:46 PM, Ted Yu <yuzhih...@gmail.com> wrote: Have you looked athttps://github.com/scalanlp/breeze/wiki Cheers On Nov 18, 2015, at 9:34 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Dear Jack, As is known, Breeze is numerical calculation package wr

Re: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Zhiliang Zhu
Dear Jack, As is known, Breeze is numerical calculation package wrote by scala , spark mllib also use it as underlying package for algebra usage.Here I am also preparing to use Breeze for nonlinear equation optimization, however, it seemed that I could not find the exact doc or API for Breeze

Re: How to properly read the first number lines of file into a RDD

2015-11-17 Thread Zhiliang Zhu
(Array(n_f))      val n_linesRDD = n_lines.map(n => {     //Read and return 5 lines (n._1) from the file (n._2)      }) ​ ThanksBest Regards On Thu, Oct 29, 2015 at 9:51 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Hi All, There is some file with line number N + M,, as I need to r

Re: could not understand issue about static spark Function (map / sortBy ...)

2015-11-10 Thread Zhiliang Zhu
while new the Function obj, and in the Function inner class the inner normal function can be called. On Tuesday, November 10, 2015 5:12 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: As more test, the Function call by map/sortBy etc must be defined as static, or it can be d

could not understand issue about static spark Function (map / sortBy ...)

2015-11-10 Thread Zhiliang Zhu
As more test, the Function call by map/sortBy etc must be defined as static, or it can be defined as non-static and must be called by other static normal function.I am really confused by it. On Tuesday, November 10, 2015 4:12 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID>

static spark Function as map

2015-11-10 Thread Zhiliang Zhu
On Tuesday, November 10, 2015 11:42 AM, Deng Ching-Mallete <och...@apache.org> wrote: Hi Zhiliang, You should be able to see them in the executor logs, which you can view via the Spark UI, in the Executors page (stderr log). HTH,Deng On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu

could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Hi All, I need debug spark job, my general way is to print out the log, however, some bug is in spark functions as mapPartitions etc, and not any log printed from those functionscould be found...Would you help point what is way to the log in the spark own function as mapPartitions? Or, what is

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
e able to see them in the executor logs, which you can view via the Spark UI, in the Executors page (stderr log). HTH,Deng On Tue, Nov 10, 2015 at 11:33 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Hi All, I need debug spark job, my general way is to print out the log, howeve

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Also for Spark UI , that  is, log from other places could be found, but the log from the functions as mapPartitions could not. On Tuesday, November 10, 2015 11:52 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Dear Ching-Mallete , There are machines master01, master02 and ma

Re: could not see the print out log in spark functions as mapPartitions

2015-11-09 Thread Zhiliang Zhu
Hi Ching-Mallete, I  have found the log and the reason for that. Thanks a lot!Zhiliang  On Tuesday, November 10, 2015 12:23 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Also for Spark UI , that  is, log from other places could be found, but the log from the fun

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
breeze for the enhancememt. Where is the API or link site for the breeze quadratic minimizer integrated with spark?And where is the breeze lpsolver... Alternatively you can use breeze lpsolver as well that uses simplex from apache math. Thank you,Zhiliang  On Nov 4, 2015 1:05 AM, "Z

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
ple integration > with spark. ecos runs as jni process in every executor. > > On Nov 1, 2015 9:52 AM, "Zhiliang Zhu" <zchl.j...@yahoo.com.invalid> wrote: >> >> Hi Ted Yu, >> >> Thanks very much for your kind reply. >> Do you just mean that in s

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Zhiliang Zhu
ecos runs as jni process in every executor. > > On Nov 1, 2015 9:52 AM, "Zhiliang Zhu" <zchl.j...@yahoo.com.invalid> wrote: >> >> Hi Ted Yu, >> >> Thanks very much for your kind reply. >> Do you just mean that in spark there is no specific

spark filter function

2015-11-04 Thread Zhiliang Zhu
Hi All, I would like to filter some elements in some given RDD, only the needed left, at the time the row number of the result RDD is smaller. Then I select filter function, however, by test, filter function would only accept Boolean type, that is to say, will only JavaRDDbe returned for

Re: [Spark MLlib] about linear regression issue

2015-11-04 Thread Zhiliang Zhu
, but currently, there is no open source implementation in Spark. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Dear All, > > As for

[Spark MLlib] about linear regression issue

2015-11-01 Thread Zhiliang Zhu
Dear All, As for N dimension linear regression, while the labeled training points number (or the rank of the labeled point space) is less than N, then from perspective of math, the weight of the trained linear model may be not unique.  However, the output of model.weight() by spark may be with

Re: apply simplex method to fix linear programming in spark

2015-11-01 Thread Zhiliang Zhu
, 2015 at 9:37 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Dear All, As I am facing some typical linear programming issue, and I know simplex method is specific in solving LP question, I am very sorry that whether there is already some mature package in spark about simplex method.

apply simplex method to fix linear programming in spark

2015-11-01 Thread Zhiliang Zhu
Dear All, As I am facing some typical linear programming issue, and I know simplex method is specific in solving LP question, I am very sorry that whether there is already some mature package in spark about simplex method... Thank you very much~Best Wishes!Zhiliang

How to properly read the first number lines of file into a RDD

2015-10-29 Thread Zhiliang Zhu
Hi All, There is some file with line number N + M,, as I need to read the first N lines into one RDD . 1. i) read all the N + M lines as one RDD, ii) select the RDD's top N rows, may be some one solution;2. if introduced some broadcast variable set N, then it is used to decide while map the

is it proper to make RDD as function parameter in the codes

2015-10-27 Thread Zhiliang Zhu
Dear All, I will program a small project by spark, and the run speed is big concern. I have a question, since RDD is always big on the cluster, is it proper to make RDD variable as parameter transferred during function call ? Thank you,Zhiliang

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-26 Thread Zhiliang Zhu
ous, are you trying to solve systems of linear equations? If so, you can probably try breeze. On Sun, Oct 25, 2015 at 9:10 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: > > > > On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu > <zchl.j...@yahoo.com.INVALID> wrote: &

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-26 Thread Zhiliang Zhu
des an intercept in the model, e.g. label = intercept + features dot weight To get the result you want, you need to force the intercept to be zero. Just curious, are you trying to solve systems of linear equations? If so, you can probably try breeze. On Sun, Oct 25, 2015 at 9:10 PM, Zhil

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
On Monday, October 26, 2015 11:26 AM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi DB Tsai, Thanks very much for your kind help. I  get it now. I am sorry that there is another issue, the weight/coefficient result is perfect while A is triangular matrix, however,

Re: [SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
B Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Oct 25, 2015 at 10:14 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: > Dear All, > > I have some program as below which makes me very much confused and > inscrutable, it is about m

[SPARK MLLIB] could not understand the wrong and inscrutable result of Linear Regression codes

2015-10-25 Thread Zhiliang Zhu
Dear All, I have some program as below which makes me very much confused and inscrutable, it is about multiple dimension linear regression mode, the weight / coefficient is always perfect while the dimension is smaller than 4, otherwise it is wrong all the time.Or, whether the

How to get inverse Matrix / RDD or how to solve linear system of equations

2015-10-23 Thread Zhiliang Zhu
Hi Sujit, and All, Currently I lost in large difficulty, I am eager to get some help from you. There is some big linear system of equations as:Ax = b,  A with N number of row and N number of column, N is very large, b = [0, 0, ..., 0, 1]TThen, I will sovle it to get x = [x1, x2, ..., xn]T. The

Re: How to get inverse Matrix / RDD or how to solve linear system of equations

2015-10-23 Thread Zhiliang Zhu
he.org/docs/1.2.0/mllib-dimensionality-reduction.html[2] http://math.stackexchange.com/questions/458404/how-can-we-compute-pseudoinverse-for-any-matrix On Fri, Oct 23, 2015 at 2:19 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Sujit, and All, Currently I lost in large difficulty, I am

[Spark MLlib] How to apply spark ml given models for questions with general background

2015-10-19 Thread Zhiliang Zhu
Dear All, I am new for spark ml. There is some project for me, for some given math model and I would like to get its optimized solution.It is very similar with spark mllib application. However, the key problem for me is that the given math model is not obviously belonging to the models ( as

[Spark ML] How to extends MLlib's optimization algorithm

2015-10-15 Thread Zhiliang Zhu
Dear All, I would like to use spark ml to develop some project related with optimization algorithm, however, in spark 1.4.1 it seems that under ml's optimizer there are only about 2 optimization algorithms. My project may needs more kinds of optimization algorithms, then how would I use spark

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-27 Thread Zhiliang Zhu
Hi All, Would some expert help me some about the issue... I shall appreciate you kind help very much! Thank you!   Zhiliang  On Sunday, September 27, 2015 7:40 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi Alexis, Gavin, Thanks very much for your kind comm

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
It seems that is due to spark  SPARK_LOCAL_IP setting.export SPARK_LOCAL_IP=localhost will not work. Then, how it would be set. Thank you all~~ On Friday, September 25, 2015 5:57 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi Steve, Thanks a lot for your

How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
,or for some other reasons... This issue is urgent for me, would some expert provide some help about this problem... I will show sincere appreciation towards your help. Thank you!Best Regards,Zhiliang On Friday, September 25, 2015 7:53 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID>

How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
Hi All, I would like to submit spark job on some another remote machine outside the cluster,I also copied hadoop/spark conf files under the remote machine, then hadoopjob would be submitted, but spark job would not. In spark-env.sh, it may be due to that SPARK_LOCAL_IP is not properly set,or

Re: How to properly set conf/spark-env.sh for spark to run on yarn

2015-09-25 Thread Zhiliang Zhu
on linux command side? Best Regards,Zhiliang On Saturday, September 26, 2015 10:07 AM, Gavin Yue <yue.yuany...@gmail.com> wrote: Print out your env variables and check first  Sent from my iPhone On Sep 25, 2015, at 18:43, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote:

How to set spark envoirnment variable SPARK_LOCAL_IP in conf/spark-env.sh

2015-09-25 Thread Zhiliang Zhu
Hi all, The spark job will run on yarn. While I do not set SPARK_LOCAL_IP any, or just set asexport  SPARK_LOCAL_IP=localhost    #or set as the specific node ip on the specific spark install directory It will work well to submit spark job on master node of cluster, however, it will fail by

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
And the remote machine is not in the same local area network with the cluster . On Friday, September 25, 2015 12:28 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Hi Zhan, I have done that as your kind help. However, I just could use "hadoop fs -ls/-mkdir/-rm XX

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
ste...@hortonworks.com> wrote: On 25 Sep 2015, at 05:25, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at the remote machine with gateway,  which means the namenode is reachable; all those commands

Re: how to submit the spark job outside the cluster

2015-09-24 Thread Zhiliang Zhu
nks. Zhan Zhang On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Zhan, Yes, I get it now. I have not ever deployed hadoop configuration locally, and do not find the specific doc, would you help provide the doc to do that... Thank you,Zhiliang On Wednesday, Septemb

How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
Hi All, There are two RDDs :  RDD rdd1, and RDD rdd2,that is to say, rdd1 and rdd2 are similar with DataFrame, or Matrix with same row number and column number. I would like to get RDD rdd3,  each element in rdd3 is the subtract between rdd1 and rdd2 of thesame position,

Re: How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
there is matrix add API, might map rdd2 each row element to be negative , then make rdd1 and rdd2 and call add ? Or some more ways ... On Wednesday, September 23, 2015 3:11 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi All, There are two RDDs :  RDD<Array> rdd1, a

Re: How to subtract two RDDs with same size

2015-09-23 Thread Zhiliang Zhu
ray(0.0, -8.0, 0.0)) -sujit On Wed, Sep 23, 2015 at 12:23 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: there is matrix add API, might map rdd2 each row element to be negative , then make rdd1 and rdd2 and call add ? Or some more ways ... On Wednesday, September 23, 2015 3:11 P

how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Dear Experts, Spark job is running on the cluster by yarn. Since the job can be submited at the place on the machine from the cluster,however, I would like to submit the job from another machine which does not belong to the cluster.I know for this, hadoop job could be done by way of another

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Sep 22, 2015, at 7:49 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Zhan, Thanks very much for your help comment.I also view it would be similar to hadoop job submit, however, I was not deciding whether it is like that whenit comes to spark.  Have you ever tried that for spark...Would

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
achine, and point the HADOOP_CONF_DIR in spark to the configuration. Thanks Zhan Zhang On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> wrote: Dear Experts, Spark job is running on the cluster by yarn. Since the job can be submited at the place on the machine from

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
former is used to access hdfs, and the latter is used to launch application on top of yarn. Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf.  Thanks. Zhan Zhang On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Zhan, Yes, I get it now. I h

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-22 Thread Zhiliang Zhu
Dear Sujit, Since you are senior with Spark, I might not know whether it is convenient for you to help comment some on my dilemma while using spark to deal with R background application ... Thank you very much!Zhiliang On Tuesday, September 22, 2015 1:45 AM, Zhiliang Zhu <zch

How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
Dear , I have took lots of days to think into this issue, however, without any success...I shall appreciate your all kind help. There is an RDD rdd1, I would like get a new RDD rdd2, each row in rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use... Thanks very

Re: how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
airRDD, and then use outer join. Does that make sense? On Mon, Sep 21, 2015 at 8:37 PM Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without  any enough good solution...I shall appreciate your a

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
g for the order of the items. What exactly are you trying to accomplish? Romi Kuntsman, Big Data Engineer http://www.totango.com On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Dear , I have took lots of days to think into this issue, however, without

how to get RDD from two different RDDs with cross column

2015-09-21 Thread Zhiliang Zhu
Dear Romi, Priya, Sujt and Shivaram and all, I have took lots of days to think into this issue, however, without  any enough good solution...I shall appreciate your all kind help. There is an RDD rdd1, and another RDD rdd2, (rdd2 can be PairRDD, or DataFrame with two columns

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
park/mllib/rdd/SlidingRDD.html So maybe something like this: new SlidingRDD(rdd1, 2, ClassTag$.apply(Class)) -sujit On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Hi Sujit, I must appreciate your kind help very much~ It seems to be OK, however, do you know the

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
, September 21, 2015 11:48 PM, Sujit Pal <sujitatgt...@gmail.com> wrote: Hi Zhiliang,  Would something like this work? val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0)) -sujit On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Hi Romi, T

Re: How to get a new RDD by ordinarily subtract its adjacent rows

2015-09-21 Thread Zhiliang Zhu
com> wrote: Hi Zhiliang,  Would something like this work? val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0)) -sujit On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: Hi Romi, Thanks very much for your kind help comment~~ In fact there is some