[GitHub] spark pull request: Unify the logic for column pruning, projection...

2014-03-24 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/213

Unify the logic for column pruning, projection, and filtering of table 
scans.

This removes a bunch of duplicated logic, dead code and casting when 
planning parquet table scans and hive table scans.

Other changes:
 - Fix tests now that we are doing a better job of column pruning (i.e. 
since pruning predicates are applied before we even start scanning tuples, 
columns required by these predicates do not need to be included unless they are 
also included in the final output of this logical plan fragment.
 - Add rule to simplify trivial filters.  This was required to avoid `WHERE 
false` from getting pushed into table scans, since `HiveTableScan` (reasonably) 
refuses to apply partition pruning predicates to non-partitioned tables.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark strategyCleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #213


commit 0ae86cfcba56b700d8e7bd869379f0c663b21c1e
Author: Michael Armbrust mich...@databricks.com
Date:   2014-03-24T04:57:42Z

Unify the logic for column pruning, projection, and filtering of table 
scans for both Hive and Parquet relations.  Fix tests now that we are doing a 
better job of column pruning.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/164#discussion_r10873319
  
--- Diff: 
mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileInputFormat.java 
---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.input;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader;
+import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+/**
+ * The specific InputFormat reads files in HDFS or local disk into pair 
(filename, content) format.
+ * It will be called by HadoopRDD to generate new 
WholeTextFileRecordReader.
--- End diff --

Sorry I forgot this is Java. Then you should use `@link` instead of `[[]]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38414578
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38414579
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13381/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify the logic for column pruning, projection...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/213#issuecomment-38414612
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: java.nio.charset.MalformedInputException

2014-03-24 Thread scwf
Github user scwf closed the pull request at:

https://github.com/apache/spark/pull/212


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
GitHub user qqsun8819 opened a pull request:

https://github.com/apache/spark/pull/214

[SPARK-1141] [WIP] Parallelize Task Serialization 

https://spark-project.atlassian.net/browse/SPARK-1141
@kayousterhout
copied from JIRA(design doc in JIRA is old, I'll update it later)
TaskSetManager.resourceOffer will return a TaskDescWithoutSerializeTask 
object , this object will be a half-copy of TaskDescrption exception 
_serializedTask ByteBffer, instead, it will contain a Task object and seriailze 
part inside TaskSetManager.resourceOffer will be moved to TaskSchedulerImpl's 
Runnable working thread which will be placed inside threadpool.

DriverSuite failed in my own env. Working on fixing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/qqsun8819/spark task-serialize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #214


commit 53795965dd16c54a4981ef4ee754f326663f9795
Author: Ouyang Jin jin@alibaba-inc.com
Date:   2014-03-16T15:57:43Z

Initial version of Parallelize Task Serialization in dev code, but this 
version has a chance to hang in multi-task execution and needs debug

commit 0bb37447d403c63b21b06cf15a612eb363c701da
Author: OuYang Jin jin@alibaba-inc.com
Date:   2014-03-23T14:47:56Z

Merge remote-tracking branch 'upstream/master' into task-serialize

commit 177195d20ddef34d339f6385d50382944c9c149d
Author: OuYang Jin jin@alibaba-inc.com
Date:   2014-03-24T06:16:27Z

Modify asychroniazed sleep wait to pass job running case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/208#issuecomment-38415798
  
```scala
package org.apache.spark.sql
package catalyst
```
vs
```scala
package org.apache.spark.sql.catalyst
```

There are three reasons I think it'd be better to just have one package 
statement in a file. 

1. It is easier to understand by most programmers, especially the ones that 
come from Java land (I was chatting with another committer just now and he got 
confused by the meaning of having two or three package statements in a Scala 
file).

2. It requires explicit import to open up the parent package scope and 
avoids polluting the namespace (there is no difference in terms of line count 
here since you add one import but remove one package)

3. It is more consistent with the rest of Spark code base.

Now this is a highly subjective topic, so we should get others to chime in. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/208#issuecomment-38416577
  
@pwendell When I started on this PR, I was also puzzled which option to use 
at first, until I saw the same usage in Scala standard library.  But I should 
confess that I wasn't 100% sure about what the first option exactly mean until 
I played around both of them a bit.

I think the most significant advantage of the first option is that we 
open parent packages implicitly. But since we forbid relative package imports, 
this is not exactly an advantage any more. So I vote for the second option now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/214#issuecomment-38416907
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1057: Upgrade fastutil to 6.5.11

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/215#issuecomment-38416906
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38416924
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1294 Fix resolution of uppercase field n...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/202#issuecomment-38416911
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38416925
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1294 Fix resolution of uppercase field n...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/202#issuecomment-38416912
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38417050
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13386/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1057: Upgrade fastutil to 6.5.11

2014-03-24 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/215#issuecomment-38417092
  
LGTM - not sure about merging into 0.9.1 though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/208#issuecomment-38417357
  
I think a point here is whether we should assume all contributors use IDEs 
like IntelliJ and their automated features. At least, I find in most scenarios, 
default behaviours of IntelliJ match Spark coding convention well. Exceptions 
include indentation and false positive suggestions about adding/removing 
parenthesis to/from Java getter/setter methods.

Maybe we can suggest developers (including ourselves) to rely on IDE more 
and even provide a default IntelliJ configuration that match Spark coding 
convention better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1280: Stage.name return apply at Op...

2014-03-24 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/180#issuecomment-38417874
  
Who can merge the improvement for web UI?
@aarondav 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1280: Stage.name return apply at Op...

2014-03-24 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/180#issuecomment-38418878
  
Thanks for this patch. Would you mind providing an example stack trace 
where this helped? I want to get a better sense of the issue to see if this is 
specific to Option or part of a more sinister problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

2014-03-24 Thread berngp
Github user berngp commented on the pull request:

https://github.com/apache/spark/pull/116#issuecomment-38418948
  
@aarondav provided the squash commit with the agreed changes.

Bellow the help message that we could have if we want to support the args 
used by #86. Please share your thoughts.

```
${txtbld}Usage${txtrst}: spark-shell [OPTIONS]

${txtbld}OPTIONS${txtrst}:
-h  --help  : Print this help information.
--executor-memory   : The memory used by each executor of the Spark 
Shell, the number 
  is followed by m for megabytes or g for 
gigabytes, e.g. 1g.
--driver-memory : The memory used by the Spark Shell, the number is 
followed 
  by m for megabytes or g for gigabytes, e.g. 1g, 
defaults to 512Mb.
--master: A full string that describes the Spark Master, 
defaults to local
  e.g. spark://localhost:7077.
--log-conf  : Enables logging of the supplied SparkConf as INFO 
at start of the
  Spark Context.

${txtbld}Spark standalone with cluster deploy mode only${txtrst}:
--driver-cores  : Cores for driver.
--supervise : Whether to restart the driver on failure.

${txtbld}Spark standalone and Mesos only${txtrst}:
--total-executor-cores : CORES Total cores for all executors.

${txtbld}YARN-only${txtrst}:
--executor-cores: Number of cores per executor (Default: 1).
--executor-memory   : Memory per executor (e.g. 1000M, 2G) (Default: 
1G).
--queue QUEUE   : The YARN queue to submit the application to 
(Default: 'default').
--num-executors NUM : Number of executors to start (Default: 2).
--files FILES   : Comma separated list of files to be placed next 
to all executors.
--archives ARCHIVES : Comma separated list of archives to be extracted 
next to all executors.

e.g.
spark-shell -m spark://localhost:7077 -c 4 -dm 512m -em 2g
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1057: Upgrade fastutil to 6.5.11

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/215#issuecomment-38419532
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13384/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1057: Upgrade fastutil to 6.5.11

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/215#issuecomment-38419531
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1294 Fix resolution of uppercase field n...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/202#issuecomment-38419522
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13385/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38419549
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38419548
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/124#issuecomment-38419558
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38419554
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/116#issuecomment-38419564
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/116#issuecomment-38419565
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38420087
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13387/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38420085
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fix SPARK-1280: Stage.name return apply at Op...

2014-03-24 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/180#issuecomment-38422713
  
val pairs = sc.parallelize(Array((1, 1), (1, 2), (1, 3), (2, 1)))
pairs.take(1)

http://host:4040/stages/
Completed Stages table Description Column   = apply at Option.scala:120

Option.scala:120

@inline final def getOrElse[B : A](default: = B): B =
  if (isEmpty) default else this.get


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/116#issuecomment-38423316
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13390/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38423311
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/124#issuecomment-38423314
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13389/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-38423315
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13388/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Adding an option to persist Spark RDD blocks ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38423375
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Adding an option to persist Spark RDD blocks ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38423374
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify the logic for column pruning, projection...

2014-03-24 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/213#issuecomment-38423970
  
Two unused imports are left in `HiveStrategies.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-24 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/164#discussion_r10877199
  
--- Diff: 
mllib/src/main/java/org/apache/spark/mllib/input/WholeTextFileRecordReader.java 
---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.input;
+
+import java.io.IOException;
+
+import com.google.common.io.Closeables;
+import org.apache.commons.io.IOUtils;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+
+/**
+ * An codeorg.apache.hadoop.mapreduce.RecordReader/code for reading 
whole text file out in
+ * (filename, content) format. Each element in split is an record of a 
unique, whole file. File name
+ * is the full path name for easy deduplicate.
+ */
+public class WholeTextFileRecordReader extends RecordReaderString, Text {
+  private Path path;
+
+  private String key = null;
+  private Text value = null;
+
+  private boolean processed = false;
+
+  private FileSystem fs;
+
+  public WholeTextFileRecordReader(
+  CombineFileSplit split,
+  TaskAttemptContext context,
+  Integer index)
+throws IOException {
+path = split.getPath(index);
+fs = path.getFileSystem(context.getConfiguration());
+  }
+
+  @Override
+  public void initialize(InputSplit arg0, TaskAttemptContext arg1)
+throws IOException, InterruptedException {
+  }
+
+  @Override
+  public void close() throws IOException {
+  }
+
+  @Override
+  public float getProgress() throws IOException {
+return processed ? 1.0f : 0.0f;
+  }
+
+  @Override
+  public String getCurrentKey() throws IOException, InterruptedException {
+return key;
+  }
+
+  @Override
+  public Text getCurrentValue() throws IOException, InterruptedException{
+return value;
+  }
+
+  @Override
+  public boolean nextKeyValue() throws IOException {
+if (!processed) {
+  if (key == null) {
+key = path.toString();
+  }
+  if (value == null) {
+value = new Text();
+  }
+
+  FSDataInputStream fileIn = null;
+  try {
+fileIn = fs.open(path);
+byte[] innerBuffer = IOUtils.toByteArray(fileIn);
--- End diff --

@mengxr @yinxusen PS I am happy to take on removal of use of commons-io in 
favor of equivalents in Guava. There is a bit more than this usage, but it's 
easy stuff. Commons IO is fine but it's not necessary to use it here and it is 
one of those dependencies that could collide with other versions in Hadoop. If 
anyone nods I'll open a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Unify the logic for column pruning, projection...

2014-03-24 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/213#issuecomment-38428081
  
LGTM, much cleaner :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread ueshin
Github user ueshin commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38432140
  
Added a test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38432384
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38432385
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/164#issuecomment-38436606
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38436529
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13392/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38436603
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/164#issuecomment-38436605
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1210] Prevent ContextClassLoader of Act...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-38436528
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread alig
Github user alig commented on the pull request:

https://github.com/apache/spark/pull/208#issuecomment-38439045
  
Also +1 for ```package org.apache.spark.sql.catalyst```, just because it's 
simpler to understand for the majority of the programmers in the world ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...

2014-03-24 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/128#issuecomment-38440744
  
yes I can add something, although I don't really have a way to test it.  

Note that my original question was how we want to go about adding support 
for windows/linux specific shell commands.  Borrowing from Hadoop, we could 
create some generic classes like UnixShellScriptBuilder and 
WindowsShellScriptBuilder then based on the OS type instantiate the correct 
one.Or since this is only one place I could just conditionalize it there 
and if we add more shell commands it can be made generic later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1133] add small files input in MLlib

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/164#issuecomment-38441593
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441595
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13393/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441592
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441655
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441656
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441791
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441793
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13395/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441957
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13396/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38441955
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38442090
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38442093
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884042
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -93,6 +96,10 @@ private[spark] class TaskSchedulerImpl(
   val mapOutputTracker = SparkEnv.get.mapOutputTracker
 
   var schedulableBuilder: SchedulableBuilder = null
+
+  private val serializeWorkerPool = new ThreadPoolExecutor(20, 60, 60, 
TimeUnit.SECONDS,
--- End diff --

also KeepAliveTime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884203
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -243,12 +275,18 @@ private[spark] class TaskSchedulerImpl(
   } while (launchedTask)
 }
 
+do {
+  Thread.sleep(1)
+} while(!serializingTask.isEmpty) 
+
 if (tasks.size  0) {
   hasLaunchedTask = true
 }
 return tasks
   }
 
+
+
--- End diff --

extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884189
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -243,12 +275,18 @@ private[spark] class TaskSchedulerImpl(
   } while (launchedTask)
 }
 
+do {
+  Thread.sleep(1)
+} while(!serializingTask.isEmpty) 
+
 if (tasks.size  0) {
   hasLaunchedTask = true
 }
 return tasks
   }
 
+
--- End diff --

extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884300
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(
 taskSet.parent.name, taskSet.name, taskSet.runningTasks))
 }
 
+val ser = SparkEnv.get.closureSerializer.newInstance()
 // Take each TaskSet in our scheduling order, and then offer it each 
node in increasing order
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 var launchedTask = false
+val serializingTask = new HashSet[Long]
--- End diff --

as you are not fetching serialize task from this HashSet, but just use 
taskDesc in L250, can we just replace this with an integer, in L280, when this 
integer is not zero, keep sleeping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884378
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(
 taskSet.parent.name, taskSet.name, taskSet.runningTasks))
 }
 
+val ser = SparkEnv.get.closureSerializer.newInstance()
 // Take each TaskSet in our scheduling order, and then offer it each 
node in increasing order
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 var launchedTask = false
+val serializingTask = new HashSet[Long]
--- End diff --

oh, some concurrency issue here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884645
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -30,6 +30,9 @@ import scala.util.Random
 import org.apache.spark._
 import org.apache.spark.TaskState.TaskState
 import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
+import org.apache.spark.util.Utils
+import scala.collection.mutable
--- End diff --

Do you mind adjusting the import statements order, see Contributing to 
Spark wiki page


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10884928
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(
 taskSet.parent.name, taskSet.name, taskSet.runningTasks))
 }
 
+val ser = SparkEnv.get.closureSerializer.newInstance()
 // Take each TaskSet in our scheduling order, and then offer it each 
node in increasing order
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 var launchedTask = false
+val serializingTask = new HashSet[Long]
--- End diff --

just take care of the concurrency issue here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38448296
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38448297
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13397/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request:

https://github.com/apache/spark/pull/214#discussion_r10886619
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(
 taskSet.parent.name, taskSet.name, taskSet.runningTasks))
 }
 
+val ser = SparkEnv.get.closureSerializer.newInstance()
 // Take each TaskSet in our scheduling order, and then offer it each 
node in increasing order
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 var launchedTask = false
+val serializingTask = new HashSet[Long]
--- End diff --

Good point here, I'll try to replace it with AtomicLong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38462860
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on the pull request:

https://github.com/apache/spark/pull/214#issuecomment-38463531
  
Fix DriverSuite case fail 
put threadpool inside resourceoffer and shutdown it before it return
some other fix according to @CodingCat 's review 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Fixed coding style issues in Spark SQL

2014-03-24 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/208#issuecomment-38466998
  
Thanks for all your votes! I'll fix this in a separate PR ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark parquet improvements

2014-03-24 Thread AndreSchumacher
Github user AndreSchumacher commented on a diff in the pull request:

https://github.com/apache/spark/pull/195#discussion_r10892494
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala ---
@@ -72,16 +73,43 @@ case class ParquetRelation(val tableName: String, val 
path: String) extends Base
   /** Output **/
   override val output = attributes
 
+  /** Name (dummy value) */
+  // TODO: rethink whether ParquetRelation should inherit from BaseRelation
+  // (currently required to re-use HiveStrategies but should be removed)
+  override def tableName = parquet
+
   // Parquet files have no concepts of keys, therefore no Partitioner
   // Note: we could allow Block level access; needs to be thought through
   override def isPartitioned = false
 }
 
 object ParquetRelation {
+  // change this to enable/disable Parquet logging
+  var DEBUG: Boolean = false
+
+  // TODO: consider redirecting Parquet's log output to log4j logger and
+  // using config file for log settings
+  def setParquetLogLevel() {
+val level: Level = if (DEBUG) Level.FINEST else Level.WARNING
--- End diff --

Now I'm actually reading this here: emj.u.l. to SLF4J translation can 
seriously increase the cost of disabled logging statements (60 fold or 
6000%)/em Apparently there is a way to void this by using logback (a fork of 
log4j?). Parquet does fairly low-level logging and relies statements on these 
now being compiled as I understand. Any opinions? I could try if it would work 
via logback or see how this would degrade performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/216#issuecomment-38471741
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38471725
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13398/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38471724
  
Build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/216#issuecomment-38472118
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/216#issuecomment-38472286
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/216#issuecomment-38472412
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/216#issuecomment-38472415
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13400/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Adding an option to persist Spark RDD blocks ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38473429
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...

2014-03-24 Thread CodingCat
Github user CodingCat commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-38476549
  
I adjusted the code to capture the exception inside processEvent function, 
so that we can easily test the function in DAGSchedulerSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Adding an option to persist Spark RDD blocks ...

2014-03-24 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38476606
  
@RongGu I just created a JIRA for this:

https://spark-project.atlassian.net/browse/SPARK-1305

Do you mind updating the title here to start with SPARK-1305:? Also, can 
you create an account on the Spark JIRA so I can assign you as the contributor 
of this feature? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/207#discussion_r10896563
  
--- Diff: 
tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.tools
+
+import java.io.File
+import java.util.jar.JarFile
+
+import scala.collection.mutable
+import scala.collection.JavaConversions._
+import scala.reflect.runtime.universe.runtimeMirror
+
+/**
+ * A tool for generating classes to be excluded during binary checking 
with MIMA. It is expected
+ * that this tool is run with ./spark-class.
+ *
+ * MIMA itself only supports JVM-level visibility and doesn't account for 
package-private classes.
+ * This tool looks at all currently package-private classes and generates 
exclusions for them. Note
+ * that this approach is not sound. It can lead to false positives if we 
move or rename a previously
+ * package-private class. It can lead to false negatives if someone 
explicitly makes a class
+ * package-private that wasn't before. This exists only to help catch 
certain classes of changes
+ * which might be difficult to catch during review.
+ */
+object GenerateMIMAIgnore {
+  private val classLoader = Thread.currentThread().getContextClassLoader
+  private val mirror = runtimeMirror(classLoader)
+
+  private def classesPrivateWithin(packageName: String): Set[String] = {
+
+val classes = getClasses(packageName, classLoader)
+val privateClasses = mutable.HashSet[String]()
+
+def isPackagePrivate(className: String) = {
+  try {
+/* Couldn't figure out if it's possible to determine a-priori 
whether a given symbol
+   is a module or class. */
+
+val privateAsClass = mirror
+  .staticClass(className)
+  .privateWithin
+  .fullName
+  .startsWith(packageName)
+
+val privateAsModule = mirror
+  .staticModule(className)
+  .privateWithin
+  .fullName
+  .startsWith(packageName)
+
+privateAsClass || privateAsModule
+  } catch {
+case _: Throwable = {
+  println(Error determining visibility:  + className)
+  false
+}
+  }
+}
+
+for (className - classes) {
+  val directlyPrivateSpark = isPackagePrivate(className)
+
+  /* Inner classes defined within a private[spark] class or object are 
effectively
+ invisible, so we account for them as package private. */
+  val indirectlyPrivateSpark = {
+val maybeOuter = className.toString.takeWhile(_ != '$')
+if (maybeOuter != className) {
+  isPackagePrivate(maybeOuter)
+} else {
+  false
+}
+  }
+  if (directlyPrivateSpark || indirectlyPrivateSpark) privateClasses 
+= className
+}
+privateClasses.flatMap(c = Seq(c, c.replace($, #))).toSet
+  }
+
+  def main(args: Array[String]) {
+scala.tools.nsc.io.File(.mima-excludes).
+  writeAll(classesPrivateWithin(org.apache.spark).mkString(\n))
+println(Created : .mima-excludes in current directory.)
+  }
+
+
+  private def shouldExclude(name: String) = {
+// Heuristic to remove JVM classes that do not correspond to 
user-facing classes in Scala
+name.contains(anon) ||
--- End diff --

I keep trying to come up with a valid class name that contains anon. In 
the dictionary, there seems to be Canon, Lebanon, and, of course, anonymous. 
We're probably safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket

[GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...

2014-03-24 Thread RongGu
Github user RongGu commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38478522
  
@pwendell , I've updated the title of this PR with prefix 'SPARK-1305:'. 
Also, I've created my account on the Spark JIRA. It's named RongGu.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...

2014-03-24 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/101#issuecomment-38478514
  
Looks good to me. Merged into master - thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1128: set hadoop task properties when co...

2014-03-24 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/101#issuecomment-38478659
  
Oops, spoke to soon, some changes have apparently made this PR not cleanly 
mergeable. Mind doing a rebase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/216#discussion_r10897487
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/discretization/EMDDiscretizer.scala 
---
@@ -0,0 +1,402 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements. See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.mllib.discretization
+
+import scala.collection.mutable.Stack
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.util.InfoTheory
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.mllib.regression.LabeledPoint
+import scala.collection.mutable
+
+
+/**
+ * This class contains methods to discretize continuous values with the 
method proposed in
+ * [Fayyad and Irani, Multi-Interval Discretization of Continuous-Valued 
Attributes, 1993]
+ */
+class EMDDiscretizer private (
--- End diff --

The name `EMD` is not a common acronym for the algorithm. `MDLP` was used 
in the paper and `MDL` was used in derived work. But I do think `MDL` is more 
confusing. Shall we call it `EntropyMinimizationDiscretizer`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38479671
  
One or more automated tests failed
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13399/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38479779
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38479800
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38479780
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38479965
  
Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1094 Support MiMa for reporting binary c...

2014-03-24 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/207#issuecomment-38480096
  
Looks good to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/216#discussion_r10897825
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala 
---
@@ -0,0 +1,53 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements. See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the License); you may not use this file except in compliance with
+* the License. You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an AS IS BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.mllib.discretization
+
+import org.apache.spark.AccumulatorParam
+
+object MapAccumulator extends AccumulatorParam[Map[String, Int]] {
--- End diff --

Mark it package private if this is not intended to be used by users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/158#issuecomment-38480623
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >