Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
Steven You summarized mostly correct. But there is a couple points I want to emphasize. Not every cluster have the Hive Service enabled. So The Yarn Client shouldn't try to get the hive delegation token just because security mode is enabled. The Yarn Client code can check if the

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Doug Balog
> On Oct 21, 2015, at 8:45 PM, Chester Chen wrote: > > Doug > thanks for responding. > >>I think Spark just needs to be compiled against 1.2.1 > >Can you elaborate on this, or specific command you are referring ? > >In our build.scala, I was including the

repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread 周千昊
Hi, spark community I have an application which I try to migrate from MR to Spark. It will do some calculations from Hive and output to hfile which will be bulk load to HBase Table, details as follow: Rdd input = getSourceInputFromHive() Rdd>

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Charmee Patel
A similar issue occurs when interacting with Hive secured by Sentry. https://issues.apache.org/jira/browse/SPARK-9042 By changing how Hive Context instance is created, this issue might also be resolved. On Thu, Oct 22, 2015 at 11:33 AM Steve Loughran wrote: > On 22 Oct

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread Reynold Xin
Why do you do a glom? It seems unnecessarily expensive to materialize each partition in memory. On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 wrote: > Hi, spark community > I have an application which I try to migrate from MR to Spark. > It will do some calculations from

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread DB Tsai
There is a JIRA for this. I know Holden is interested in this. On Thursday, October 22, 2015, YiZhi Liu wrote: > Would someone mind giving some hint? > > 2015-10-20 15:34 GMT+08:00 YiZhi Liu >: > > Hi all, > > > > I noticed that in

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread chester
Doug We are not trying to compiling against different version of hive. The 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving from spark 1.3.1 to 1.5.1. Simply trying to supply the needed dependency. The rest of application (besides spark) simply uses hive 0.13.1.

Re: Guaranteed processing orders of each batch in Spark Streaming

2015-10-22 Thread Akhil Das
I guess the order is guaranteed unless you set the spark.streaming.concurrentJobs to a higher number than 1. Thanks Best Regards On Mon, Oct 19, 2015 at 12:28 PM, Renjie Liu wrote: > Hi, all: > I've read source code and it seems that there is no guarantee that the >

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Chester Chen
Thanks Steve Likes the slides on kerberos, I have enough scars from Kerberos while trying to integrated it with (Pig, MapRed, Hive JDBC, and HCatalog and Spark) etc. I am still having trouble making Impersonating to work for HCatalog. I might send you an offline email to ask some pointers

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Steve Loughran
On 22 Oct 2015, at 21:54, Chester Chen > wrote: Thanks Steve Likes the slides on kerberos, I have enough scars from Kerberos while trying to integrated it with (Pig, MapRed, Hive JDBC, and HCatalog and Spark) etc. I am still having

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Steve Loughran
On 22 Oct 2015, at 19:32, Chester Chen > wrote: Steven You summarized mostly correct. But there is a couple points I want to emphasize. Not every cluster have the Hive Service enabled. So The Yarn Client shouldn't try to get the

Re: Set numExecutors by sparklaunch

2015-10-22 Thread Luc Bourlier
Hi, I don't know much about you particular use case, but most (if not all) of the Spark command line parameters can also be specified as properties. You should try to use SparkLauncher.setConf("spark.executor.instances", "3") HTH, Luc Luc Bourlier *Spark Team - Typesafe, Inc.*

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread 周千昊
+kylin dev list 周千昊 于2015年10月23日周五 上午10:20写道: > Hi, Reynold > Using glom() is because it is easy to adapt to calculation logic > already implemented in MR. And o be clear, we are still in POC. > Since the results shows there is almost no difference between this >

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread YiZhi Liu
Thank you Tsai. Holden, would you mind posting the JIRA issue id here? I searched but found nothing. Thanks. 2015-10-23 1:36 GMT+08:00 DB Tsai : > There is a JIRA for this. I know Holden is interested in this. > > > On Thursday, October 22, 2015, YiZhi Liu

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread 周千昊
Hi, Reynold Using glom() is because it is easy to adapt to calculation logic already implemented in MR. And o be clear, we are still in POC. Since the results shows there is almost no difference between this glom stage and the MR mapper, using glom here might not be the issue. I

Re: Possible bug on Spark Yarn Client (1.5.1) during kerberos mode ?

2015-10-22 Thread Steve Loughran
On 22 Oct 2015, at 08:25, Chester Chen > wrote: Doug We are not trying to compiling against different version of hive. The 1.2.1.spark hive-exec is specified on spark 1.5.2 Pom file. We are moving from spark 1.3.1 to 1.5.1. Simply trying

Trouble creating JIRA issue

2015-10-22 Thread Richard Marscher
Hi, I'm working on following the guidelines for contributing code to Spark and am trying to create a related JIRA issue. I'm logged into my user on issues.apache.org, but I don't seem to have an option to create an issue, just browse/search existing. Any help would be appreciated! Thanks --

Re: Trouble creating JIRA issue

2015-10-22 Thread Ted Yu
You can use the following link: https://issues.apache.org/jira/secure/CreateIssue!default.jspa Remember to select Spark as the project. On Thu, Oct 22, 2015 at 9:38 AM, Richard Marscher wrote: > Hi, > > I'm working on following the guidelines for contributing code to

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread YiZhi Liu
Would someone mind giving some hint? 2015-10-20 15:34 GMT+08:00 YiZhi Liu : > Hi all, > > I noticed that in ml.classification.LogisticRegression, users are not > allowed to set initial coefficients, while it is supported in > mllib.classification.LogisticRegressionWithSGD. >