[jira] [Commented] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe

2015-12-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045309#comment-15045309
 ] 

Ted Yu commented on SPARK-12181:


We can add method to Platform which performs check on the architecture.

In MemoryManager.scala, when "spark.unsafe.offHeap" is true but the above check 
doesn't pass, raise exception.

Comment is welcome.

> Check Cached unaligned-access capability before using Unsafe
> 
>
> Key: SPARK-12181
> URL: https://issues.apache.org/jira/browse/SPARK-12181
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction.
> However, the Oracle implementation uses these methods only if the class 
> variable unaligned (commented as "Cached unaligned-access capability") is 
> true, which seems to be calculated whether the architecture is i386, x86, 
> amd64, or x86_64.
> I think we should perform similar check for the use of Unsafe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe

2015-12-07 Thread Ted Yu (JIRA)
Ted Yu created SPARK-12181:
--

 Summary: Check Cached unaligned-access capability before using 
Unsafe
 Key: SPARK-12181
 URL: https://issues.apache.org/jira/browse/SPARK-12181
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction.

However, the Oracle implementation uses these methods only if the class 
variable unaligned (commented as "Cached unaligned-access capability") is true, 
which seems to be calculated whether the architecture is i386, x86, amd64, or 
x86_64.

I think we should perform similar check for the use of Unsafe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12074) Avoid memory copy involving ByteBuffer.wrap(ByteArrayOutputStream.toByteArray)

2015-12-01 Thread Ted Yu (JIRA)
Ted Yu created SPARK-12074:
--

 Summary: Avoid memory copy involving 
ByteBuffer.wrap(ByteArrayOutputStream.toByteArray)
 Key: SPARK-12074
 URL: https://issues.apache.org/jira/browse/SPARK-12074
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Ted Yu


SPARK-12060 fixed JavaSerializerInstance.serialize
This issue applies the same technique (via ByteBufferOutputStream) on two other 
classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-11-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11206) Support SQL UI on the history server

2015-11-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029183#comment-15029183
 ] 

Ted Yu commented on SPARK-11206:


Looks like SQLListenerMemoryLeakSuite fails on maven Jenkins now.
e.g.
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4251/HADOOP_PROFILE=hadoop-2.4,label=spark-test/testReport/org.apache.spark.sql.execution.ui/SQLListenerMemoryLeakSuite/no_memory_leak/

> Support SQL UI on the history server
> 
>
> Key: SPARK-11206
> URL: https://issues.apache.org/jira/browse/SPARK-11206
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL, Web UI
>Reporter: Carson Wang
>Assignee: Carson Wang
> Fix For: 1.7.0
>
>
> On the live web UI, there is a SQL tab which provides valuable information 
> for the SQL query. But once the workload is finished, we won't see the SQL 
> tab on the history server. It will be helpful if we support SQL UI on the 
> history server so we can analyze it even after its execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11971) Start py4j callback server for Java Gateway

2015-11-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-11971.

Resolution: Not A Problem

The callback server is started in _ensure_initialized of 
python/pyspark/streaming/context.py

> Start py4j callback server for Java Gateway
> ---
>
> Key: SPARK-11971
> URL: https://issues.apache.org/jira/browse/SPARK-11971
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> See the thread 'pyspark does not seem to start py4j callback server'
> This issue starts py4j callback server for Java Gateway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11971) Start py4j callback server for Java Gateway

2015-11-24 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11971:
--

 Summary: Start py4j callback server for Java Gateway
 Key: SPARK-11971
 URL: https://issues.apache.org/jira/browse/SPARK-11971
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


See the thread 'pyspark does not seem to start py4j callback server'

This issue starts py4j callback server for Java Gateway



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11884) Drop multiple columns in the DataFrame API

2015-11-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024837#comment-15024837
 ] 

Ted Yu commented on SPARK-11884:


Is there interest in moving forward with the PR ?

> Drop multiple columns in the DataFrame API
> --
>
> Key: SPARK-11884
> URL: https://issues.apache.org/jira/browse/SPARK-11884
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> See the thread Ben started:
> http://search-hadoop.com/m/q3RTtveEuhjsr7g/
> This issue adds drop() method to DataFrame which accepts multiple column names



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11884) Drop multiple columns in the DataFrame API

2015-11-20 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11884:
--

 Summary: Drop multiple columns in the DataFrame API
 Key: SPARK-11884
 URL: https://issues.apache.org/jira/browse/SPARK-11884
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Ted Yu
Priority: Minor


See the thread Ben started:
http://search-hadoop.com/m/q3RTtveEuhjsr7g/

This issue adds drop() method to DataFrame which accepts multiple column names




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11872) Prevent the call to SparkContext#stop() in the listener bus's thread

2015-11-19 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11872:
--

 Summary: Prevent the call to SparkContext#stop() in the listener 
bus's thread
 Key: SPARK-11872
 URL: https://issues.apache.org/jira/browse/SPARK-11872
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Ted Yu


This is continuation of SPARK-11761

Andrew suggested adding this protection. See tail of PR #9741



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11761) Prevent the call to StreamingContext#stop() in the listener bus's thread

2015-11-16 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11761:
--

 Summary: Prevent the call to StreamingContext#stop() in the 
listener bus's thread
 Key: SPARK-11761
 URL: https://issues.apache.org/jira/browse/SPARK-11761
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Reporter: Ted Yu


Quoting Shixiong's comment from https://github.com/apache/spark/pull/9723 :
{code}
The user should not call stop or other long-time work in a listener since it 
will block the listener thread, and prevent from stopping 
SparkContext/StreamingContext.

I cannot see an approach since we need to stop the listener bus's thread before 
stopping SparkContext/StreamingContext totally.
{code}
Proposed solution is to prevent the call to StreamingContext#stop() in the 
listener bus's thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called

2015-11-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-11572.

Resolution: Won't Fix

> Exit AsynchronousListenerBus thread when stop() is called
> -
>
> Key: SPARK-11572
> URL: https://issues.apache.org/jira/browse/SPARK-11572
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>
> As vonnagy reported in the following thread:
> http://search-hadoop.com/m/q3RTtk982kvIow22
> Attempts to join the thread in AsynchronousListenerBus resulted in lock up 
> because AsynchronousListenerBus thread was still getting messages 
> SparkListenerExecutorMetricsUpdate from the DAGScheduler
> Proposed fix is to check stopped flag within the loop of 
> AsynchronousListenerBus thread



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11699) TrackStateRDDSuite fails on Jenkins builds

2015-11-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-11699.

Resolution: Duplicate

Same as SPARK-11290

> TrackStateRDDSuite fails on Jenkins builds
> --
>
> Key: SPARK-11699
> URL: https://issues.apache.org/jira/browse/SPARK-11699
> Project: Spark
>  Issue Type: Test
>Reporter: Ted Yu
>
> As of build #4087, 
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4087/testReport/
>  , TrackStateRDDSuite fails for both hadoop profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11699) TrackStateRDDSuite fails on Jenkins builds

2015-11-12 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11699:
--

 Summary: TrackStateRDDSuite fails on Jenkins builds
 Key: SPARK-11699
 URL: https://issues.apache.org/jira/browse/SPARK-11699
 Project: Spark
  Issue Type: Test
Reporter: Ted Yu


As of build #4087, 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4087/testReport/
 , TrackStateRDDSuite fails for both hadoop profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters

2015-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002290#comment-15002290
 ] 

Ted Yu commented on SPARK-11661:


Should have looked back further.

But it seems the test failure is not intermittent.

> We should still pushdown filters returned by a data source's unhandledFilters
> -
>
> Key: SPARK-11661
> URL: https://issues.apache.org/jira/browse/SPARK-11661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Blocker
> Fix For: 1.6.0
>
>
> We added unhandledFilters interface to SPARK-10978. So, a data source has a 
> chance to let Spark SQL know that for those returned filters, it is possible 
> that the data source will not apply them to every row. So, Spark SQL should 
> use a Filter operator to evaluate those filters. However, if a filter is a 
> part of returned unhandledFilters, we should still push it down. For example, 
> our internal data sources do not override this method, if we do not push down 
> those filters, we are actually turning off the filter pushdown feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters

2015-11-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002256#comment-15002256
 ] 

Ted Yu commented on SPARK-11661:


Looks like org.apache.spark.streaming.rdd.TrackStateRDDSuite started to fail 
since this went in:
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4084/

> We should still pushdown filters returned by a data source's unhandledFilters
> -
>
> Key: SPARK-11661
> URL: https://issues.apache.org/jira/browse/SPARK-11661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Blocker
> Fix For: 1.6.0, 1.7.0
>
>
> We added unhandledFilters interface to SPARK-10978. So, a data source has a 
> chance to let Spark SQL know that for those returned filters, it is possible 
> that the data source will not apply them to every row. So, Spark SQL should 
> use a Filter operator to evaluate those filters. However, if a filter is a 
> part of returned unhandledFilters, we should still push it down. For example, 
> our internal data sources do not override this method, if we do not push down 
> those filters, we are actually turning off the filter pushdown feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11682) Commons-collections object deserialization may expose remote command execution vulnerability

2015-11-11 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11682:
--

 Summary: Commons-collections object deserialization may expose 
remote command execution vulnerability
 Key: SPARK-11682
 URL: https://issues.apache.org/jira/browse/SPARK-11682
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/

TL;DR: If you have commons-collections on your classpath and accept and process 
Java object serialization data, then you may have an exploitable remote command 
execution vulnerability.

In ./launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java :
{code}
  ObjectInputStream in = new ObjectInputStream(socket.getInputStream());
  while (!closed) {
Message msg = (Message) in.readObject();
{code}

There may be other occurrence(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11662) Call startExecutorDelegationTokenRenewer() ahead of client app submission

2015-11-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-11662:
---
Component/s: YARN

> Call startExecutorDelegationTokenRenewer() ahead of client app submission
> -
>
> Key: SPARK-11662
> URL: https://issues.apache.org/jira/browse/SPARK-11662
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Ted Yu
>
> As reported in the thread 'Creating new Spark context when running in Secure 
> YARN fails', IOException may be thrown when SparkContext is stopped and 
> started again working with secure YARN cluster:
> {code}
> 15/11/11 10:19:53 ERROR spark.SparkContext: Error initializing SparkContext.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token
> can be issued only with kerberos or web authentication
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6638)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:563)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1044)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1543)
> at
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530)
> at
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2228)
> at
> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:126)
> at
> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:123)
> at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
> at
> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:123)
> at
> org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:495)
> at
> org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:528)
> at
> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628)
> at
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
> at
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
> at org.apache.spark.SparkContext.(SparkContext.scala:523)
> {code}
> One fix is to call startExecutorDelegationTokenRenewer(conf) ahead of client 
> app submission.



--
This message

[jira] [Created] (SPARK-11662) Call startExecutorDelegationTokenRenewer() ahead of client app submission

2015-11-11 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11662:
--

 Summary: Call startExecutorDelegationTokenRenewer() ahead of 
client app submission
 Key: SPARK-11662
 URL: https://issues.apache.org/jira/browse/SPARK-11662
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


As reported in the thread 'Creating new Spark context when running in Secure 
YARN fails', IOException may be thrown when SparkContext is stopped and started 
again working with secure YARN cluster:
{code}
15/11/11 10:19:53 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token
can be issued only with kerberos or web authentication
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6638)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:563)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1044)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1543)
at
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530)
at
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508)
at
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2228)
at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:126)
at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:123)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:123)
at
org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:495)
at
org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:528)
at
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628)
at
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
at
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.(SparkContext.scala:523)
{code}
One fix is to call startExecutorDelegationTokenRenewer(conf) ahead of client 
app submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11615) Drop @VisibleForTesting annotation

2015-11-09 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11615:
--

 Summary: Drop @VisibleForTesting annotation
 Key: SPARK-11615
 URL: https://issues.apache.org/jira/browse/SPARK-11615
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


See http://search-hadoop.com/m/q3RTtjpe8r1iRbTj2 for discussion.

Summary: addition of @VisibleForTesting annotation resulted in spark-shell 
malfunctioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called

2015-11-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-11572:
---
Component/s: Spark Core

> Exit AsynchronousListenerBus thread when stop() is called
> -
>
> Key: SPARK-11572
> URL: https://issues.apache.org/jira/browse/SPARK-11572
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>
> As vonnagy reported in the following thread:
> http://search-hadoop.com/m/q3RTtk982kvIow22
> Attempts to join the thread in AsynchronousListenerBus resulted in lock up 
> because AsynchronousListenerBus thread was still getting messages 
> SparkListenerExecutorMetricsUpdate from the DAGScheduler
> Proposed fix is to check stopped flag within the loop of 
> AsynchronousListenerBus thread



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called

2015-11-08 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11572:
--

 Summary: Exit AsynchronousListenerBus thread when stop() is called
 Key: SPARK-11572
 URL: https://issues.apache.org/jira/browse/SPARK-11572
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


As vonnagy reported in the following thread:
http://search-hadoop.com/m/q3RTtk982kvIow22

Attempts to join the thread in AsynchronousListenerBus resulted in lock up 
because AsynchronousListenerBus thread was still getting messages 
SparkListenerExecutorMetricsUpdate from the DAGScheduler

Proposed fix is to check stopped flag within the loop of 
AsynchronousListenerBus thread



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-11-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985438#comment-14985438
 ] 

Ted Yu commented on SPARK-11371:


[~rxin] [~yhuai]:
Your comment is welcome.

> Make "mean" an alias for "avg" operator
> ---
>
> Key: SPARK-11371
> URL: https://issues.apache.org/jira/browse/SPARK-11371
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11371-v1.patch
>
>
> From Reynold in the thread 'Exception when using some aggregate operators'  
> (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
> I don't think these are bugs. The SQL standard for average is "avg", not 
> "mean". Similarly, a distinct count is supposed to be written as 
> "count(distinct col)", not "countDistinct(col)".
> We can, however, make "mean" an alias for "avg" to improve compatibility 
> between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-11-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985424#comment-14985424
 ] 

Ted Yu commented on SPARK-11371:


[~sowen]:
Do you think it is worth adding the alias ?

Thanks

> Make "mean" an alias for "avg" operator
> ---
>
> Key: SPARK-11371
> URL: https://issues.apache.org/jira/browse/SPARK-11371
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11371-v1.patch
>
>
> From Reynold in the thread 'Exception when using some aggregate operators'  
> (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
> I don't think these are bugs. The SQL standard for average is "avg", not 
> "mean". Similarly, a distinct count is supposed to be written as 
> "count(distinct col)", not "countDistinct(col)".
> We can, however, make "mean" an alias for "avg" to improve compatibility 
> between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11442) Reduce numSlices for local metrics test of SparkListenerSuite

2015-11-01 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11442:
--

 Summary: Reduce numSlices for local metrics test of 
SparkListenerSuite
 Key: SPARK-11442
 URL: https://issues.apache.org/jira/browse/SPARK-11442
 Project: Spark
  Issue Type: Test
  Components: Tests
Reporter: Ted Yu
Priority: Minor


In the thread, 
http://search-hadoop.com/m/q3RTtcQiFSlTxeP/test+failed+due+to+OOME&subj=test+failed+due+to+OOME,
 it was discussed that memory consumption for SparkListenerSuite should be 
brought down.

This is an attempt in that direction by reducing numSlices for local metrics 
test.

Before change:

Run completed in 57 seconds, 357 milliseconds.

Reducing numSlices to 16 results in:

Run completed in 44 seconds, 115 milliseconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11435) Stop SparkContext at the end of subtest in SparkListenerSuite

2015-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984422#comment-14984422
 ] 

Ted Yu commented on SPARK-11435:


LocalSparkContext would close the SparkContext.

> Stop SparkContext at the end of subtest in SparkListenerSuite
> -
>
> Key: SPARK-11435
> URL: https://issues.apache.org/jira/browse/SPARK-11435
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Ted Yu
>Priority: Minor
>
> Some subtests in SparkListenerSuite creates SparkContext without stopping it 
> explicitly upon completion of the subtest.
> This issue is to stop SparkContext explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11435) Stop SparkContext at the end of subtest in SparkListenerSuite

2015-10-30 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11435:
--

 Summary: Stop SparkContext at the end of subtest in 
SparkListenerSuite
 Key: SPARK-11435
 URL: https://issues.apache.org/jira/browse/SPARK-11435
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Reporter: Ted Yu
Priority: Minor


Some subtests in SparkListenerSuite creates SparkContext without stopping it 
explicitly upon completion of the subtest.

This issue is to stop SparkContext explicitly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter

2015-10-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983392#comment-14983392
 ] 

Ted Yu commented on SPARK-11348:


Without this change, make-distribution.sh is likely to bump into issue for 
master branch.

> Replace addOnCompleteCallback with addTaskCompletionListener() in 
> UnsafeExternalSorter
> --
>
> Key: SPARK-11348
> URL: https://issues.apache.org/jira/browse/SPARK-11348
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11348.txt
>
>
> When practicing the command from SPARK-11318, I got the following:
> {code}
> [WARNING] 
> /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15]
>  [deprecation]  
> addOnCompleteCallback(Function0) in TaskContext has been deprecated
> {code}
> addOnCompleteCallback should be replaced with addTaskCompletionListener()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-10-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981887#comment-14981887
 ] 

Ted Yu commented on SPARK-11371:


That's true. 

> Make "mean" an alias for "avg" operator
> ---
>
> Key: SPARK-11371
> URL: https://issues.apache.org/jira/browse/SPARK-11371
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11371-v1.patch
>
>
> From Reynold in the thread 'Exception when using some aggregate operators'  
> (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
> I don't think these are bugs. The SQL standard for average is "avg", not 
> "mean". Similarly, a distinct count is supposed to be written as 
> "count(distinct col)", not "countDistinct(col)".
> We can, however, make "mean" an alias for "avg" to improve compatibility 
> between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-10-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980484#comment-14980484
 ] 

Ted Yu commented on SPARK-11371:


Since I cannot assign the JIRA to myself, attaching patch shows my intention 
working on the JIRA.
The background is that I wanted to open 3 PRs as of yerterday but I don't have 
as many email addresses (forked repo's, i.e.).

I am more than willing to learn from experts how multiple outstanding PRs are 
managed.

As for the mean alias, I quoted Reynold's response.
I am open to discussion on whether this would ultimately go through.

> Make "mean" an alias for "avg" operator
> ---
>
> Key: SPARK-11371
> URL: https://issues.apache.org/jira/browse/SPARK-11371
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11371-v1.patch
>
>
> From Reynold in the thread 'Exception when using some aggregate operators'  
> (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
> I don't think these are bugs. The SQL standard for average is "avg", not 
> "mean". Similarly, a distinct count is supposed to be written as 
> "count(distinct col)", not "countDistinct(col)".
> We can, however, make "mean" an alias for "avg" to improve compatibility 
> between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-10-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-10-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-11371:
---
Attachment: spark-11371-v1.patch

> Make "mean" an alias for "avg" operator
> ---
>
> Key: SPARK-11371
> URL: https://issues.apache.org/jira/browse/SPARK-11371
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11371-v1.patch
>
>
> From Reynold in the thread 'Exception when using some aggregate operators'  
> (http://search-hadoop.com/m/q3RTt0xFr22nXB4/):
> I don't think these are bugs. The SQL standard for average is "avg", not 
> "mean". Similarly, a distinct count is supposed to be written as 
> "count(distinct col)", not "countDistinct(col)".
> We can, however, make "mean" an alias for "avg" to improve compatibility 
> between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11371) Make "mean" an alias for "avg" operator

2015-10-28 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11371:
--

 Summary: Make "mean" an alias for "avg" operator
 Key: SPARK-11371
 URL: https://issues.apache.org/jira/browse/SPARK-11371
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


>From Reynold in the thread 'Exception when using some aggregate operators'  
>(http://search-hadoop.com/m/q3RTt0xFr22nXB4/):

I don't think these are bugs. The SQL standard for average is "avg", not 
"mean". Similarly, a distinct count is supposed to be written as 
"count(distinct col)", not "countDistinct(col)".

We can, however, make "mean" an alias for "avg" to improve compatibility 
between DataFrame and SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter

2015-10-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-11348:
---
Priority: Minor  (was: Trivial)

> Replace addOnCompleteCallback with addTaskCompletionListener() in 
> UnsafeExternalSorter
> --
>
> Key: SPARK-11348
> URL: https://issues.apache.org/jira/browse/SPARK-11348
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-11348.txt
>
>
> When practicing the command from SPARK-11318, I got the following:
> {code}
> [WARNING] 
> /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15]
>  [deprecation]  
> addOnCompleteCallback(Function0) in TaskContext has been deprecated
> {code}
> addOnCompleteCallback should be replaced with addTaskCompletionListener()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter

2015-10-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-11348:
---
Attachment: spark-11348.txt

> Replace addOnCompleteCallback with addTaskCompletionListener() in 
> UnsafeExternalSorter
> --
>
> Key: SPARK-11348
> URL: https://issues.apache.org/jira/browse/SPARK-11348
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Trivial
> Attachments: spark-11348.txt
>
>
> When practicing the command from SPARK-11318, I got the following:
> {code}
> [WARNING] 
> /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15]
>  [deprecation]  
> addOnCompleteCallback(Function0) in TaskContext has been deprecated
> {code}
> addOnCompleteCallback should be replaced with addTaskCompletionListener()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter

2015-10-27 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11348:
--

 Summary: Replace addOnCompleteCallback with 
addTaskCompletionListener() in UnsafeExternalSorter
 Key: SPARK-11348
 URL: https://issues.apache.org/jira/browse/SPARK-11348
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu
Priority: Trivial


When practicing the command from SPARK-11318, I got the following:
{code}
[WARNING] 
/home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15]
 [deprecation]  
addOnCompleteCallback(Function0) in TaskContext has been deprecated
{code}
addOnCompleteCallback should be replaced with addTaskCompletionListener()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11318) [DOC] Include hive profile in make-distribution.sh command

2015-10-26 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11318:
--

 Summary: [DOC] Include hive profile in make-distribution.sh command
 Key: SPARK-11318
 URL: https://issues.apache.org/jira/browse/SPARK-11318
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


The tgz I built using the current command shown in building-spark.html does not 
produce the datanucleus jars which are included in the "boxed" spark 
distributions.

hive profile should be included so that the tar ball matches spark distribution.

See 'Problem with make-distribution.sh' thread on user@ for background.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11286) Make Outbox stopped exception singleton

2015-10-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-11286.

Resolution: Won't Fix

> Make Outbox stopped exception singleton
> ---
>
> Key: SPARK-11286
> URL: https://issues.apache.org/jira/browse/SPARK-11286
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Trivial
>
> In two places in Outbox.scala , new SparkException is created for Outbox 
> stopped condition.
> Create a singleton for Outbox stopped exception and use it instead of 
> creating exception every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11286) Make Outbox stopped exception singleton

2015-10-23 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11286:
--

 Summary: Make Outbox stopped exception singleton
 Key: SPARK-11286
 URL: https://issues.apache.org/jira/browse/SPARK-11286
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Trivial


In two places in Outbox.scala , new SparkException is created for Outbox 
stopped condition.

Create a singleton for Outbox stopped exception and use it instead of creating 
exception every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11172) Close JsonParser/Generator in test

2015-10-17 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11172:
--

 Summary: Close JsonParser/Generator in test
 Key: SPARK-11172
 URL: https://issues.apache.org/jira/browse/SPARK-11172
 Project: Spark
  Issue Type: Task
Reporter: Ted Yu
Priority: Trivial


JsonParser / Generator created in test should be closed.

This is in continuation to SPARK-11124



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-10-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager

2015-10-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954202#comment-14954202
 ] 

Ted Yu commented on SPARK-10985:


I am bit confused by the assignment of this JIRA.
Normally assignee is 'Apache Spark' before a pull request comes up.

It has been at least 2 days but I don't see PR.

Did I miss something ?

> Avoid passing evicted blocks throughout BlockManager / CacheManager
> ---
>
> Key: SPARK-10985
> URL: https://issues.apache.org/jira/browse/SPARK-10985
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Reporter: Andrew Or
>Assignee: Bowen Zhang
>Priority: Minor
>
> This is a minor refactoring task.
> Currently when we attempt to put a block in, we get back an array buffer of 
> blocks that are dropped in the process. We do this to propagate these blocks 
> back to our TaskContext, which will add them to its TaskMetrics so we can see 
> them in the SparkUI storage tab properly.
> Now that we have TaskContext.get, we can just use that to propagate this 
> information. This simplifies a lot of the signatures and gets rid of weird 
> return types like the following everywhere:
> {code}
> ArrayBuffer[(BlockId, BlockStatus)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11048) Use ForkJoinPool as executorService

2015-10-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-11048.

Resolution: Won't Fix

> Use ForkJoinPool as executorService
> ---
>
> Key: SPARK-11048
> URL: https://issues.apache.org/jira/browse/SPARK-11048
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> ForkJoinPool: threads are created only if there are waiting tasks. They 
> expire after 2seconds (it's
> hardcoded in the jdk code).
> ForkJoinPool is better than ThreadPoolExecutor
> It's available in the JDK 1.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11048) Use ForkJoinPool as executorService

2015-10-10 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11048:
--

 Summary: Use ForkJoinPool as executorService
 Key: SPARK-11048
 URL: https://issues.apache.org/jira/browse/SPARK-11048
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


ForkJoinPool: threads are created only if there are waiting tasks. They expire 
after 2seconds (it's
hardcoded in the jdk code).
ForkJoinPool is better than ThreadPoolExecutor
It's available in the JDK 1.7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor

2015-10-08 Thread Ted Yu (JIRA)
Ted Yu created SPARK-11006:
--

 Summary: Rename NullColumnAccess as NullColumnAccessor
 Key: SPARK-11006
 URL: https://issues.apache.org/jira/browse/SPARK-11006
 Project: Spark
  Issue Type: Task
Reporter: Ted Yu
Priority: Trivial


In sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala , 
NullColumnAccess should be renmaed as NullColumnAccessor so that same 
convention is adhered to for the accessors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-4066:
--
Description: 
Here is the thread Koert started:
http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.

  was:
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.


> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:


A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME

2015-10-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10787:
---
Priority: Major  (was: Minor)

> Consider replacing ObjectOutputStream for serialization to prevent OOME
> ---
>
> Key: SPARK-10787
> URL: https://issues.apache.org/jira/browse/SPARK-10787
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>
> In the thread, Spark ClosureCleaner or java serializer OOM when trying to 
> grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
> ClosureCleaner#ensureSerializable() resulted in OOME.
> The cause was that ObjectOutputStream keeps a strong reference of every 
> object that was written to it.
> This issue tries to avoid OOME by considering alternative to 
> ObjectOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME

2015-10-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10787:
---
Summary: Consider replacing ObjectOutputStream for serialization to prevent 
OOME  (was: Reset ObjectOutputStream more often to prevent OOME)

> Consider replacing ObjectOutputStream for serialization to prevent OOME
> ---
>
> Key: SPARK-10787
> URL: https://issues.apache.org/jira/browse/SPARK-10787
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> In the thread, Spark ClosureCleaner or java serializer OOM when trying to 
> grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
> ClosureCleaner#ensureSerializable() resulted in OOME.
> The cause was that ObjectOutputStream keeps a strong reference of every 
> object that was written to it.
> This issue tries to avoid OOME by calling reset() more often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME

2015-10-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10787:
---
Description: 
In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow 
(http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
ClosureCleaner#ensureSerializable() resulted in OOME.

The cause was that ObjectOutputStream keeps a strong reference of every object 
that was written to it.

This issue tries to avoid OOME by considering alternative to ObjectOutputStream

  was:
In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow 
(http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
ClosureCleaner#ensureSerializable() resulted in OOME.

The cause was that ObjectOutputStream keeps a strong reference of every object 
that was written to it.

This issue tries to avoid OOME by calling reset() more often.


> Consider replacing ObjectOutputStream for serialization to prevent OOME
> ---
>
> Key: SPARK-10787
> URL: https://issues.apache.org/jira/browse/SPARK-10787
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> In the thread, Spark ClosureCleaner or java serializer OOM when trying to 
> grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
> ClosureCleaner#ensureSerializable() resulted in OOME.
> The cause was that ObjectOutputStream keeps a strong reference of every 
> object that was written to it.
> This issue tries to avoid OOME by considering alternative to 
> ObjectOutputStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-09-30 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.

  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.



> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934292#comment-14934292
 ] 

Ted Yu commented on SPARK-10787:


I can think of two approaches:
1. Cloning ObjectOutputStream using weak references. We need to consider 
license. Also, ObjectOutputStream may reference Java internal methods / fields. 
This makes maintaining the clone difficult.

2. Switch to Kryo-based serialization.

> Reset ObjectOutputStream more often to prevent OOME
> ---
>
> Key: SPARK-10787
> URL: https://issues.apache.org/jira/browse/SPARK-10787
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> In the thread, Spark ClosureCleaner or java serializer OOM when trying to 
> grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
> ClosureCleaner#ensureSerializable() resulted in OOME.
> The cause was that ObjectOutputStream keeps a strong reference of every 
> object that was written to it.
> This issue tries to avoid OOME by calling reset() more often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2015-09-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-4066:
--
Description: 
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.

  was:
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit


It would be flexible if whether maven build fails due to scalastyle violation 
configurable.


> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-09-25 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10561:
---
Description: 
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


  was:
Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.


> Provide tooling for auto-generating Spark SQL reference manual
> --
>
> Key: SPARK-10561
> URL: https://issues.apache.org/jira/browse/SPARK-10561
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Reporter: Ted Yu
>
> Here is the discussion thread:
> http://search-hadoop.com/m/q3RTtcD20F1o62xE
> Richard Hillegas made the following suggestion:
> A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
> to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
> support provided by the Scala language. I am new to programming in Scala, so 
> I don't know whether the Scala ecosystem provides any good tools for 
> reverse-engineering a BNF from a class which extends 
> scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-10787:
---
Description: 
In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow 
(http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
ClosureCleaner#ensureSerializable() resulted in OOME.

The cause was that ObjectOutputStream keeps a strong reference of every object 
that was written to it.

This issue tries to avoid OOME by calling reset() more often.

  was:
In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow, 
Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME.

The cause was that ObjectOutputStream keeps a strong reference of every object 
that was written to it.

This issue tries to avoid OOME by calling reset() more often.


> Reset ObjectOutputStream more often to prevent OOME
> ---
>
> Key: SPARK-10787
> URL: https://issues.apache.org/jira/browse/SPARK-10787
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> In the thread, Spark ClosureCleaner or java serializer OOM when trying to 
> grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that 
> ClosureCleaner#ensureSerializable() resulted in OOME.
> The cause was that ObjectOutputStream keeps a strong reference of every 
> object that was written to it.
> This issue tries to avoid OOME by calling reset() more often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME

2015-09-23 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10787:
--

 Summary: Reset ObjectOutputStream more often to prevent OOME
 Key: SPARK-10787
 URL: https://issues.apache.org/jira/browse/SPARK-10787
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow, 
Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME.

The cause was that ObjectOutputStream keeps a strong reference of every object 
that was written to it.

This issue tries to avoid OOME by calling reset() more often.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10721) Log warning when file deletion fails

2015-09-20 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10721:
--

 Summary: Log warning when file deletion fails
 Key: SPARK-10721
 URL: https://issues.apache.org/jira/browse/SPARK-10721
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


There're several places in the code base where return value from File.delete() 
is ignored.

This issue adds checking for the boolean return value and logs warning when 
deletion fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10701) Expose SparkContext#stopped flag with @DeveloperApi

2015-09-18 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10701:
--

 Summary: Expose SparkContext#stopped flag with @DeveloperApi
 Key: SPARK-10701
 URL: https://issues.apache.org/jira/browse/SPARK-10701
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


SPARK-9522 added stopped flag as private[spark].

See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1

We should expose this flag to developers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual

2015-09-11 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10561:
--

 Summary: Provide tooling for auto-generating Spark SQL reference 
manual
 Key: SPARK-10561
 URL: https://issues.apache.org/jira/browse/SPARK-10561
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu


Here is the discussion thread:
http://search-hadoop.com/m/q3RTtcD20F1o62xE

Richard Hillegas made the following suggestion:

A machine-generated BNF, however, is easy to imagine. But perhaps not so easy 
to implement. Spark's SQL grammar is implemented in Scala, extending the DSL 
support provided by the Scala language. I am new to programming in Scala, so I 
don't know whether the Scala ecosystem provides any good tools for 
reverse-engineering a BNF from a class which extends 
scala.util.parsing.combinator.syntactical.StandardTokenParsers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10546) Check partitionId's range in ExternalSorter#spill()

2015-09-10 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10546:
--

 Summary: Check partitionId's range in ExternalSorter#spill()
 Key: SPARK-10546
 URL: https://issues.apache.org/jira/browse/SPARK-10546
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.4.1
Reporter: Ted Yu
Priority: Minor


See this thread for background:
http://search-hadoop.com/m/q3RTt0rWvIkHAE81

We should check the range of partition Id and provide meaningful message 
through exception.

Alternatively, we can use abs() and modulo to force the partition Id into 
legitimate range. However, expectation is that user should correct the logic 
error in his / her code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2015-08-27 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-4066:
--
Description: 
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit


It would be flexible if whether maven build fails due to scalastyle violation 
configurable.

  was:
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.


> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10074) Include Float in @specialized annotation

2015-08-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved SPARK-10074.

Resolution: Later

Until the need comes.

> Include Float in @specialized annotation
> 
>
> Key: SPARK-10074
> URL: https://issues.apache.org/jira/browse/SPARK-10074
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> There're several places in Spark codebase where we use @specialized 
> annotation covering Long and Double.
> e.g. in OpenHashMap.scala :
> {code}
> class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag](
> initialCapacity: Int)
> {code}
> Float should be added to @specialized annotation as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10074) Include Float in @specialized annotation

2015-08-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701388#comment-14701388
 ] 

Ted Yu commented on SPARK-10074:


I would argue that Double should be taken out from MutablePair (and other 
pertinent classes) before there is use for it.

> Include Float in @specialized annotation
> 
>
> Key: SPARK-10074
> URL: https://issues.apache.org/jira/browse/SPARK-10074
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> There're several places in Spark codebase where we use @specialized 
> annotation covering Long and Double.
> e.g. in OpenHashMap.scala :
> {code}
> class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag](
> initialCapacity: Int)
> {code}
> Float should be added to @specialized annotation as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10074) Include Float in @specialized annotation

2015-08-18 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701332#comment-14701332
 ] 

Ted Yu commented on SPARK-10074:


I performed the following search in Spark codebase:

find . -name '*.scala' -exec grep 'MutablePair.*Double' {} \; -print

There is only one match:
{code}
case class MutablePair[@specialized(Int, Long, Double, Char, Boolean/* , AnyRef 
*/) T1,
./core/src/main/scala/org/apache/spark/util/MutablePair.scala
{code}
I think adding Float would provide parity with Double, potentially benefiting 
future use.

> Include Float in @specialized annotation
> 
>
> Key: SPARK-10074
> URL: https://issues.apache.org/jira/browse/SPARK-10074
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> There're several places in Spark codebase where we use @specialized 
> annotation covering Long and Double.
> e.g. in OpenHashMap.scala :
> {code}
> class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag](
> initialCapacity: Int)
> {code}
> Float should be added to @specialized annotation as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10074) Include Float in @specialized annotation

2015-08-17 Thread Ted Yu (JIRA)
Ted Yu created SPARK-10074:
--

 Summary: Include Float in @specialized annotation
 Key: SPARK-10074
 URL: https://issues.apache.org/jira/browse/SPARK-10074
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


There're several places in Spark codebase where we use @specialized annotation 
covering Long and Double.
e.g. in OpenHashMap.scala :
{code}
class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag](
initialCapacity: Int)
{code}
Float should be added to @specialized annotation as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9446) Clear Active SparkContext in stop() method

2015-07-29 Thread Ted Yu (JIRA)
Ted Yu created SPARK-9446:
-

 Summary: Clear Active SparkContext in stop() method
 Key: SPARK-9446
 URL: https://issues.apache.org/jira/browse/SPARK-9446
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


In thread 'stopped SparkContext remaining active' on mailing list, Andres 
observed the following in driver log:
{code}
15/07/29 15:17:09 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: 
ApplicationMaster has disassociated: 
15/07/29 15:17:09 INFO YarnClientSchedulerBackend: Shutting down all executors
Exception in thread "Yarn application state monitor" 
org.apache.spark.SparkException: Error asking standalone scheduler to shut down 
executors
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:261)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:158)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1411)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1644)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:139)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)15/07/29 15:17:09 
INFO YarnClientSchedulerBackend: Asking each executor to shut down

at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
... 6 more
{code}
Effect of the above exception is that a stopped SparkContext is returned to 
user since SparkContext.clearActiveContext() is not called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-07-08 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-07-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8336) Fix NullPointerException with functions.rand()

2015-06-12 Thread Ted Yu (JIRA)
Ted Yu created SPARK-8336:
-

 Summary: Fix NullPointerException with functions.rand()
 Key: SPARK-8336
 URL: https://issues.apache.org/jira/browse/SPARK-8336
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


The problem was first reported by Justin Yip in the thread 
'NullPointerException with functions.rand()'

Here is how to reproduce the problem:
{code}
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", 
rand(30)).show()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-06-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-05-26 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7853) ClassNotFoundException for SparkSQL

2015-05-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558316#comment-14558316
 ] 

Ted Yu commented on SPARK-7853:
---

Subject says ClassNotFoundException.
Which class couldn't be found ?

> ClassNotFoundException for SparkSQL
> ---
>
> Key: SPARK-7853
> URL: https://issues.apache.org/jira/browse/SPARK-7853
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Cheng Hao
>Priority: Blocker
>
> Reproduce steps:
> {code}
> bin/spark-sql --jars ./sql/data/files/TestSerDe.jar
> spark-sql> CREATE TABLE alter1(a INT, b INT) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.TestSerDe';
> {code}
> Throws Exception like:
> {panel}
> 15/05/25 01:33:35 ERROR thriftserver.SparkSQLDriver: Failed in [CREATE TABLE 
> alter1(a INT, b INT) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.serde2.TestSerDe']
> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
> Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot 
> validate serde: org.apache.hadoop.hive.serde2.TestSerDe
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300)
>   at 
> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457)
>   at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
>   at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:147)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:131)
>   at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727)
>   at 
> org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:283)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:218)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7538) Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge

2015-05-12 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540083#comment-14540083
 ] 

Ted Yu commented on SPARK-7538:
---

As mentioned by Cody Koeninger on the mailing list, using 
spark-streaming-kafka-assembly_2.10:1.3.1 would resolve the issue:
{code}
$ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep 
yammer | grep Gauge
  1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class
{code}

> Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge
> ---
>
> Key: SPARK-7538
> URL: https://issues.apache.org/jira/browse/SPARK-7538
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1
> Environment: Ubuntu 14.04 LTS
> java version "1.7.0_79"
> OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
> Spark 1.3.1 release.
>Reporter: Lee McFadden
>
> We have a simple streaming job, the components of which work fine in a batch 
> environment reading from a cassandra table as the source.
> We adapted it to work with streaming using the Python libs.
> Submit command line:
> {code}
> /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \
> --packages 
> TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1
>  \
> --conf 
> spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \
> --master spark://127.0.0.1:7077 \
> affected_hosts.py
> {code}
> When we run the streaming job everything starts just fine, then we see the 
> following in the logs:
> {code}
> 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 70, 
> ip-10-10-102-53.us-west-2.compute.internal): java.lang.NoClassDefFoundError: 
> com/yammer/metrics/core/Gauge
> at 
> kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151)
> at 
> kafka.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:115)
> at 
> kafka.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:128)
> at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89)
> at 
> org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)
> at 
> org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)
> at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298)
> at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:64)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.core.Gauge
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 17 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2015-05-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-4066:
--
Description: 
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.

  was:
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit


It would be flexible if whether maven build fails due to scalastyle violation 
configurable.


> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-05-11 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.
Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.

Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7450) Use UNSAFE.getLong() to speed up BitSetMethods#anySet()

2015-05-07 Thread Ted Yu (JIRA)
Ted Yu created SPARK-7450:
-

 Summary: Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
 Key: SPARK-7450
 URL: https://issues.apache.org/jira/browse/SPARK-7450
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu


Currently BitSetMethods#anySet() traverses BitSet in bytes.
We can use UNSAFE.getLong() for speed up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5041) hive-exec jar should be generated with JDK 6

2015-05-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523503#comment-14523503
 ] 

Ted Yu commented on SPARK-5041:
---

Considering '[discuss] ending support for Java 6' discussion on mailing list, 
looks like there is no need to do this anymore.

> hive-exec jar should be generated with JDK 6
> 
>
> Key: SPARK-5041
> URL: https://issues.apache.org/jira/browse/SPARK-5041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ted Yu
>  Labels: jdk1.7, maven
>
> Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar 
> cannot be used by Spark program running JDK 6.
> See http://search-hadoop.com/m/JW1q5YLCNN
> hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be 
> generated with JDK 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2015-04-24 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-4066:
--
Description: 
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit


It would be flexible if whether maven build fails due to scalastyle violation 
configurable.

  was:
Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.


> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Ted Yu
>Priority: Minor
>  Labels: style
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7107) Add parameter for zookeeper.znode.parent to hbase_inputformat.py

2015-04-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-7107:
--
Description: 
[~yeshavora] first reported encountering the following exception running 
hbase_inputformat.py :
{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
{code}
It turned out that the hbase cluster has custom znode parent:
{code}

  zookeeper.znode.parent
  /hbase-unsecure

{code}
hbase_inputformat.py should support specification of custom znode parent.

  was:
We encountered the following exception running hbase_inputformat.py :
{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
{code}
It turned out that the hbase cluster has custom znode parent:
{code}

  zookeeper.znode.parent
  /hbase-unsecure

{code}
hbase_inputformat.py should support specification of custom znode parent.


> Add parameter for zookeeper.znode.parent to hbase_inputformat.py
> 
>
> Key: SPARK-7107
> URL: https://issues.apache.org/jira/browse/SPARK-7107
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> [~yeshavora] first reported encountering the following exception running 
> hbase_inputformat.py :
> {code}
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
> : java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
> at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288)
> at 
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
> {code}
> It turned out that the hbase cluster has custom znode parent:
> {code}
> 
>   zookeeper.znode.parent
>   /hbase-unsecure
> 
> {code}
> hbase_inputformat.py should support specification of custom znode parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7107) Add parameter for zookeeper.znode.parent to hbase_inputformat.py

2015-04-23 Thread Ted Yu (JIRA)
Ted Yu created SPARK-7107:
-

 Summary: Add parameter for zookeeper.znode.parent to 
hbase_inputformat.py
 Key: SPARK-7107
 URL: https://issues.apache.org/jira/browse/SPARK-7107
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


We encountered the following exception running hbase_inputformat.py :
{code}
py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
{code}
It turned out that the hbase cluster has custom znode parent:
{code}

  zookeeper.znode.parent
  /hbase-unsecure

{code}
hbase_inputformat.py should support specification of custom znode parent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-04-16 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Description: 
floor() function is supported in Hive SQL.
This issue is to add floor() function to Spark SQL.

Related thread: http://search-hadoop.com/m/JW1q563fc22

  was:
floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.

Related thread: http://search-hadoop.com/m/JW1q563fc22


> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6085) Increase default value for memory overhead

2015-03-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342398#comment-14342398
 ] 

Ted Yu commented on SPARK-6085:
---

In my opinion, priority for this JIRA should be Major.

Users who deploy Spark on YARN in production are highly likely to hit 
computation failure(s). This would impact their business. Without intimate 
knowledge of Spark, it would take them some time to figure out the root cause.

> Increase default value for memory overhead
> --
>
> Key: SPARK-6085
> URL: https://issues.apache.org/jira/browse/SPARK-6085
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Ted Yu
>Priority: Minor
>
> Several users have communicated how current default memory overhead value 
> resulted in failed computation in Spark on YARN.
> See this thread:
> http://search-hadoop.com/m/JW1q58FDel
> Increasing default value for memory overhead would improve out of box user 
> experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6085) Increase default value for memory overhead

2015-02-28 Thread Ted Yu (JIRA)
Ted Yu created SPARK-6085:
-

 Summary: Increase default value for memory overhead
 Key: SPARK-6085
 URL: https://issues.apache.org/jira/browse/SPARK-6085
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu


Several users have communicated how current default memory overhead value 
resulted in failed computation in Spark on YARN.
See this thread:
http://search-hadoop.com/m/JW1q58FDel

Increasing default value for memory overhead would improve out of box user 
experience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL

2015-02-28 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5427:
--
Labels: math  (was: )

> Add support for floor function in Spark SQL
> ---
>
> Key: SPARK-5427
> URL: https://issues.apache.org/jira/browse/SPARK-5427
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ted Yu
>  Labels: math
>
> floor() function is supported in Hive SQL.
> This issue is to add floor() function to Spark SQL.
> Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339169#comment-14339169
 ] 

Ted Yu commented on SPARK-6045:
---

The logic of CassandraHadoopMigrator.scala is unknown.

With the current PR, user would be able to see exception earlier so that he / 
she can perform proper analysis.

> RecordWriter should be checked against null in 
> PairRDDFunctions#saveAsNewAPIHadoopDataset
> -
>
> Key: SPARK-6045
> URL: https://issues.apache.org/jira/browse/SPARK-6045
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> gtinside reported in the thread 'NullPointerException in TaskSetManager' with 
> the following stack trace:
> {code}
> WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost
> task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com):
> java.lang.NullPointerException
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007)
> com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77)
> com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Looks like the following call in finally block was the cause:
> {code}
> writer.close(hadoopContext)
> {code}
> We should check writer against null before calling close().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339147#comment-14339147
 ] 

Ted Yu commented on SPARK-6045:
---

https://github.com/apache/spark/pull/4794

> RecordWriter should be checked against null in 
> PairRDDFunctions#saveAsNewAPIHadoopDataset
> -
>
> Key: SPARK-6045
> URL: https://issues.apache.org/jira/browse/SPARK-6045
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>
> gtinside reported in the thread 'NullPointerException in TaskSetManager' with 
> the following stack trace:
> {code}
> WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost
> task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com):
> java.lang.NullPointerException
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007)
> com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77)
> com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala)
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Looks like the following call in finally block was the cause:
> {code}
> writer.close(hadoopContext)
> {code}
> We should check writer against null before calling close().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset

2015-02-26 Thread Ted Yu (JIRA)
Ted Yu created SPARK-6045:
-

 Summary: RecordWriter should be checked against null in 
PairRDDFunctions#saveAsNewAPIHadoopDataset
 Key: SPARK-6045
 URL: https://issues.apache.org/jira/browse/SPARK-6045
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


gtinside reported in the thread 'NullPointerException in TaskSetManager' with 
the following stack trace:
{code}
WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost
task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com):
java.lang.NullPointerException

org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007)
com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77)
com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}
Looks like the following call in finally block was the cause:
{code}
writer.close(hadoopContext)
{code}
We should check writer against null before calling close().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5427) Add support for floor function in Spark SQL

2015-01-27 Thread Ted Yu (JIRA)
Ted Yu created SPARK-5427:
-

 Summary: Add support for floor function in Spark SQL
 Key: SPARK-5427
 URL: https://issues.apache.org/jira/browse/SPARK-5427
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu


floor() function is supported in Hive SQL.

This issue is to add floor() function to Spark SQL.

Related thread: http://search-hadoop.com/m/JW1q563fc22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler

2015-01-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286416#comment-14286416
 ] 

Ted Yu commented on SPARK-1714:
---

allocatedHostToContainersMap.synchronized is absent for the following operation 
in runAllocatedContainers():
{code}
  val containerSet = 
allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
new HashSet[ContainerId])

  containerSet += containerId
  allocatedContainerToHostMap.put(containerId, executorHostname)
{code}
Is that intentional ?

> Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler
> 
>
> Key: SPARK-1714
> URL: https://issues.apache.org/jira/browse/SPARK-1714
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler

2015-01-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286407#comment-14286407
 ] 

Ted Yu commented on SPARK-1714:
---

{code}
if (completedContainer.getExitStatus == -103) { // vmem limit exceeded
{code}
Should ContainerExitStatus#KILLED_EXCEEDED_VMEM be referenced above ?

> Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler
> 
>
> Key: SPARK-1714
> URL: https://issues.apache.org/jira/browse/SPARK-1714
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5041) hive-exec jar should be generated with JDK 6

2015-01-18 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5041:
--
Labels: jdk1.7 maven  (was: maven)

> hive-exec jar should be generated with JDK 6
> 
>
> Key: SPARK-5041
> URL: https://issues.apache.org/jira/browse/SPARK-5041
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>  Labels: jdk1.7, maven
>
> Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar 
> cannot be used by Spark program running JDK 6.
> See http://search-hadoop.com/m/JW1q5YLCNN
> hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be 
> generated with JDK 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5041) hive-exec jar should be generated with JDK 6

2015-01-09 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-5041:
--
Labels: maven  (was: )

> hive-exec jar should be generated with JDK 6
> 
>
> Key: SPARK-5041
> URL: https://issues.apache.org/jira/browse/SPARK-5041
> Project: Spark
>  Issue Type: Bug
>Reporter: Ted Yu
>  Labels: maven
>
> Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar 
> cannot be used by Spark program running JDK 6.
> See http://search-hadoop.com/m/JW1q5YLCNN
> hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be 
> generated with JDK 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5041) hive-exec jar should be generated with JDK 6

2014-12-31 Thread Ted Yu (JIRA)
Ted Yu created SPARK-5041:
-

 Summary: hive-exec jar should be generated with JDK 6
 Key: SPARK-5041
 URL: https://issues.apache.org/jira/browse/SPARK-5041
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar 
cannot be used by Spark program running JDK 6.
See http://search-hadoop.com/m/JW1q5YLCNN

hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be 
generated with JDK 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1127) Add saveAsHBase to PairRDDFunctions

2014-12-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1423#comment-1423
 ] 

Ted Yu commented on SPARK-1127:
---

According to Reynold,
First half of the external data source API (for reading but not writing) is 
already in 1.2:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala

> Add saveAsHBase to PairRDDFunctions
> ---
>
> Key: SPARK-1127
> URL: https://issues.apache.org/jira/browse/SPARK-1127
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: haosdent huang
>Assignee: haosdent huang
> Fix For: 1.2.0
>
>
> Support to save data in HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4455) Exclude dependency on hbase-annotations module

2014-11-17 Thread Ted Yu (JIRA)
Ted Yu created SPARK-4455:
-

 Summary: Exclude dependency on hbase-annotations module
 Key: SPARK-4455
 URL: https://issues.apache.org/jira/browse/SPARK-4455
 Project: Spark
  Issue Type: Bug
Reporter: Ted Yu


As Patrick mentioned in the thread 'Has anyone else observed this build break?' 
:

The error I've seen is this when building the examples project:
{code}
spark-examples_2.10: Could not resolve dependencies for project
org.apache.spark:spark-examples_2.10:jar:1.2.0-SNAPSHOT: Could not
find artifact jdk.tools:jdk.tools:jar:1.7 at specified path
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar
{code}
The reason for this error is that hbase-annotations is using a
"system" scoped dependency in their hbase-annotations pom, and this
doesn't work with certain JDK layouts such as that provided on Mac OS:

http://central.maven.org/maven2/org/apache/hbase/hbase-annotations/0.98.7-hadoop2/hbase-annotations-0.98.7-hadoop2.pom

hbase-annotations module is transitively brought in through other HBase 
modules, we should exclude it from related modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-11-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198701#comment-14198701
 ] 

Ted Yu commented on SPARK-1297:
---

Create a new pull request:
https://github.com/apache/spark/pull/3115

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, 
> spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-11-05 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198510#comment-14198510
 ] 

Ted Yu commented on SPARK-1297:
---

Patch v7 uses 0.98.7 hbase release

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, 
> spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0

2014-11-05 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated SPARK-1297:
--
Attachment: spark-1297-v7.txt

> Upgrade HBase dependency to 0.98.0
> --
>
> Key: SPARK-1297
> URL: https://issues.apache.org/jira/browse/SPARK-1297
> Project: Spark
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, 
> spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt
>
>
> HBase 0.94.6 was released 11 months ago.
> Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2014-10-24 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183016#comment-14183016
 ] 

Ted Yu commented on SPARK-4066:
---

bq. -Dscalastyle.failOnViolation was already a built-in way to control this
See response from Koert:
{noformat}
i tried:
mvn clean package -DskipTests -Dscalastyle.failOnViolation=false

no luck, still get
{noformat}
bq. this has to be fixed anyway
I agree that this needs to be done before patch submission.
However, when formulating the patch, such check can be skipped.

> Make whether maven builds fails on scalastyle violation configurable
> 
>
> Key: SPARK-4066
> URL: https://issues.apache.org/jira/browse/SPARK-4066
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
> Attachments: spark-4066-v1.txt
>
>
> Here is the thread Koert started:
> http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit
> It would be flexible if whether maven build fails due to scalastyle violation 
> configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable

2014-10-23 Thread Ted Yu (JIRA)
Ted Yu created SPARK-4066:
-

 Summary: Make whether maven builds fails on scalastyle violation 
configurable
 Key: SPARK-4066
 URL: https://issues.apache.org/jira/browse/SPARK-4066
 Project: Spark
  Issue Type: Improvement
Reporter: Ted Yu
Priority: Minor


Here is the thread Koert started:

http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit

It would be flexible if whether maven build fails due to scalastyle violation 
configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   >