[jira] [Commented] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe
[ https://issues.apache.org/jira/browse/SPARK-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045309#comment-15045309 ] Ted Yu commented on SPARK-12181: We can add method to Platform which performs check on the architecture. In MemoryManager.scala, when "spark.unsafe.offHeap" is true but the above check doesn't pass, raise exception. Comment is welcome. > Check Cached unaligned-access capability before using Unsafe > > > Key: SPARK-12181 > URL: https://issues.apache.org/jira/browse/SPARK-12181 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction. > However, the Oracle implementation uses these methods only if the class > variable unaligned (commented as "Cached unaligned-access capability") is > true, which seems to be calculated whether the architecture is i386, x86, > amd64, or x86_64. > I think we should perform similar check for the use of Unsafe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12181) Check Cached unaligned-access capability before using Unsafe
Ted Yu created SPARK-12181: -- Summary: Check Cached unaligned-access capability before using Unsafe Key: SPARK-12181 URL: https://issues.apache.org/jira/browse/SPARK-12181 Project: Spark Issue Type: Bug Reporter: Ted Yu For MemoryMode.OFF_HEAP, Unsafe.getInt etc. are used with no restriction. However, the Oracle implementation uses these methods only if the class variable unaligned (commented as "Cached unaligned-access capability") is true, which seems to be calculated whether the architecture is i386, x86, amd64, or x86_64. I think we should perform similar check for the use of Unsafe. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12074) Avoid memory copy involving ByteBuffer.wrap(ByteArrayOutputStream.toByteArray)
Ted Yu created SPARK-12074: -- Summary: Avoid memory copy involving ByteBuffer.wrap(ByteArrayOutputStream.toByteArray) Key: SPARK-12074 URL: https://issues.apache.org/jira/browse/SPARK-12074 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Ted Yu SPARK-12060 fixed JavaSerializerInstance.serialize This issue applies the same technique (via ByteBufferOutputStream) on two other classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11206) Support SQL UI on the history server
[ https://issues.apache.org/jira/browse/SPARK-11206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15029183#comment-15029183 ] Ted Yu commented on SPARK-11206: Looks like SQLListenerMemoryLeakSuite fails on maven Jenkins now. e.g. https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4251/HADOOP_PROFILE=hadoop-2.4,label=spark-test/testReport/org.apache.spark.sql.execution.ui/SQLListenerMemoryLeakSuite/no_memory_leak/ > Support SQL UI on the history server > > > Key: SPARK-11206 > URL: https://issues.apache.org/jira/browse/SPARK-11206 > Project: Spark > Issue Type: New Feature > Components: SQL, Web UI >Reporter: Carson Wang >Assignee: Carson Wang > Fix For: 1.7.0 > > > On the live web UI, there is a SQL tab which provides valuable information > for the SQL query. But once the workload is finished, we won't see the SQL > tab on the history server. It will be helpful if we support SQL UI on the > history server so we can analyze it even after its execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11971) Start py4j callback server for Java Gateway
[ https://issues.apache.org/jira/browse/SPARK-11971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-11971. Resolution: Not A Problem The callback server is started in _ensure_initialized of python/pyspark/streaming/context.py > Start py4j callback server for Java Gateway > --- > > Key: SPARK-11971 > URL: https://issues.apache.org/jira/browse/SPARK-11971 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > See the thread 'pyspark does not seem to start py4j callback server' > This issue starts py4j callback server for Java Gateway -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11971) Start py4j callback server for Java Gateway
Ted Yu created SPARK-11971: -- Summary: Start py4j callback server for Java Gateway Key: SPARK-11971 URL: https://issues.apache.org/jira/browse/SPARK-11971 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor See the thread 'pyspark does not seem to start py4j callback server' This issue starts py4j callback server for Java Gateway -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11884) Drop multiple columns in the DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-11884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024837#comment-15024837 ] Ted Yu commented on SPARK-11884: Is there interest in moving forward with the PR ? > Drop multiple columns in the DataFrame API > -- > > Key: SPARK-11884 > URL: https://issues.apache.org/jira/browse/SPARK-11884 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > See the thread Ben started: > http://search-hadoop.com/m/q3RTtveEuhjsr7g/ > This issue adds drop() method to DataFrame which accepts multiple column names -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11884) Drop multiple columns in the DataFrame API
Ted Yu created SPARK-11884: -- Summary: Drop multiple columns in the DataFrame API Key: SPARK-11884 URL: https://issues.apache.org/jira/browse/SPARK-11884 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Ted Yu Priority: Minor See the thread Ben started: http://search-hadoop.com/m/q3RTtveEuhjsr7g/ This issue adds drop() method to DataFrame which accepts multiple column names -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11872) Prevent the call to SparkContext#stop() in the listener bus's thread
Ted Yu created SPARK-11872: -- Summary: Prevent the call to SparkContext#stop() in the listener bus's thread Key: SPARK-11872 URL: https://issues.apache.org/jira/browse/SPARK-11872 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Ted Yu This is continuation of SPARK-11761 Andrew suggested adding this protection. See tail of PR #9741 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11761) Prevent the call to StreamingContext#stop() in the listener bus's thread
Ted Yu created SPARK-11761: -- Summary: Prevent the call to StreamingContext#stop() in the listener bus's thread Key: SPARK-11761 URL: https://issues.apache.org/jira/browse/SPARK-11761 Project: Spark Issue Type: Bug Components: Streaming Reporter: Ted Yu Quoting Shixiong's comment from https://github.com/apache/spark/pull/9723 : {code} The user should not call stop or other long-time work in a listener since it will block the listener thread, and prevent from stopping SparkContext/StreamingContext. I cannot see an approach since we need to stop the listener bus's thread before stopping SparkContext/StreamingContext totally. {code} Proposed solution is to prevent the call to StreamingContext#stop() in the listener bus's thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called
[ https://issues.apache.org/jira/browse/SPARK-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-11572. Resolution: Won't Fix > Exit AsynchronousListenerBus thread when stop() is called > - > > Key: SPARK-11572 > URL: https://issues.apache.org/jira/browse/SPARK-11572 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > > As vonnagy reported in the following thread: > http://search-hadoop.com/m/q3RTtk982kvIow22 > Attempts to join the thread in AsynchronousListenerBus resulted in lock up > because AsynchronousListenerBus thread was still getting messages > SparkListenerExecutorMetricsUpdate from the DAGScheduler > Proposed fix is to check stopped flag within the loop of > AsynchronousListenerBus thread -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11699) TrackStateRDDSuite fails on Jenkins builds
[ https://issues.apache.org/jira/browse/SPARK-11699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-11699. Resolution: Duplicate Same as SPARK-11290 > TrackStateRDDSuite fails on Jenkins builds > -- > > Key: SPARK-11699 > URL: https://issues.apache.org/jira/browse/SPARK-11699 > Project: Spark > Issue Type: Test >Reporter: Ted Yu > > As of build #4087, > https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4087/testReport/ > , TrackStateRDDSuite fails for both hadoop profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11699) TrackStateRDDSuite fails on Jenkins builds
Ted Yu created SPARK-11699: -- Summary: TrackStateRDDSuite fails on Jenkins builds Key: SPARK-11699 URL: https://issues.apache.org/jira/browse/SPARK-11699 Project: Spark Issue Type: Test Reporter: Ted Yu As of build #4087, https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4087/testReport/ , TrackStateRDDSuite fails for both hadoop profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters
[ https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002290#comment-15002290 ] Ted Yu commented on SPARK-11661: Should have looked back further. But it seems the test failure is not intermittent. > We should still pushdown filters returned by a data source's unhandledFilters > - > > Key: SPARK-11661 > URL: https://issues.apache.org/jira/browse/SPARK-11661 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Blocker > Fix For: 1.6.0 > > > We added unhandledFilters interface to SPARK-10978. So, a data source has a > chance to let Spark SQL know that for those returned filters, it is possible > that the data source will not apply them to every row. So, Spark SQL should > use a Filter operator to evaluate those filters. However, if a filter is a > part of returned unhandledFilters, we should still push it down. For example, > our internal data sources do not override this method, if we do not push down > those filters, we are actually turning off the filter pushdown feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11661) We should still pushdown filters returned by a data source's unhandledFilters
[ https://issues.apache.org/jira/browse/SPARK-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002256#comment-15002256 ] Ted Yu commented on SPARK-11661: Looks like org.apache.spark.streaming.rdd.TrackStateRDDSuite started to fail since this went in: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/4084/ > We should still pushdown filters returned by a data source's unhandledFilters > - > > Key: SPARK-11661 > URL: https://issues.apache.org/jira/browse/SPARK-11661 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Blocker > Fix For: 1.6.0, 1.7.0 > > > We added unhandledFilters interface to SPARK-10978. So, a data source has a > chance to let Spark SQL know that for those returned filters, it is possible > that the data source will not apply them to every row. So, Spark SQL should > use a Filter operator to evaluate those filters. However, if a filter is a > part of returned unhandledFilters, we should still push it down. For example, > our internal data sources do not override this method, if we do not push down > those filters, we are actually turning off the filter pushdown feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11682) Commons-collections object deserialization may expose remote command execution vulnerability
Ted Yu created SPARK-11682: -- Summary: Commons-collections object deserialization may expose remote command execution vulnerability Key: SPARK-11682 URL: https://issues.apache.org/jira/browse/SPARK-11682 Project: Spark Issue Type: Bug Reporter: Ted Yu http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/ TL;DR: If you have commons-collections on your classpath and accept and process Java object serialization data, then you may have an exploitable remote command execution vulnerability. In ./launcher/src/main/java/org/apache/spark/launcher/LauncherConnection.java : {code} ObjectInputStream in = new ObjectInputStream(socket.getInputStream()); while (!closed) { Message msg = (Message) in.readObject(); {code} There may be other occurrence(s). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11662) Call startExecutorDelegationTokenRenewer() ahead of client app submission
[ https://issues.apache.org/jira/browse/SPARK-11662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-11662: --- Component/s: YARN > Call startExecutorDelegationTokenRenewer() ahead of client app submission > - > > Key: SPARK-11662 > URL: https://issues.apache.org/jira/browse/SPARK-11662 > Project: Spark > Issue Type: Bug > Components: YARN >Reporter: Ted Yu > > As reported in the thread 'Creating new Spark context when running in Secure > YARN fails', IOException may be thrown when SparkContext is stopped and > started again working with secure YARN cluster: > {code} > 15/11/11 10:19:53 ERROR spark.SparkContext: Error initializing SparkContext. > org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token > can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6638) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:563) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > at org.apache.hadoop.ipc.Client.call(Client.java:1476) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1044) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1543) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2228) > at > org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:126) > at > org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:123) > at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) > at > org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:123) > at > org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:495) > at > org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:528) > at > org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628) > at > org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119) > at > org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) > at > org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) > at org.apache.spark.SparkContext.(SparkContext.scala:523) > {code} > One fix is to call startExecutorDelegationTokenRenewer(conf) ahead of client > app submission. -- This message
[jira] [Created] (SPARK-11662) Call startExecutorDelegationTokenRenewer() ahead of client app submission
Ted Yu created SPARK-11662: -- Summary: Call startExecutorDelegationTokenRenewer() ahead of client app submission Key: SPARK-11662 URL: https://issues.apache.org/jira/browse/SPARK-11662 Project: Spark Issue Type: Bug Reporter: Ted Yu As reported in the thread 'Creating new Spark context when running in Secure YARN fails', IOException may be thrown when SparkContext is stopped and started again working with secure YARN cluster: {code} 15/11/11 10:19:53 ERROR spark.SparkContext: Error initializing SparkContext. org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6638) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:563) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1476) at org.apache.hadoop.ipc.Client.call(Client.java:1407) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1044) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1543) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2228) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:126) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$$anonfun$obtainTokensForNamenodes$1.apply(YarnSparkHadoopUtil.scala:123) at scala.collection.immutable.Set$Set1.foreach(Set.scala:74) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokensForNamenodes(YarnSparkHadoopUtil.scala:123) at org.apache.spark.deploy.yarn.Client.getTokenRenewalInterval(Client.scala:495) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:528) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:523) {code} One fix is to call startExecutorDelegationTokenRenewer(conf) ahead of client app submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11615) Drop @VisibleForTesting annotation
Ted Yu created SPARK-11615: -- Summary: Drop @VisibleForTesting annotation Key: SPARK-11615 URL: https://issues.apache.org/jira/browse/SPARK-11615 Project: Spark Issue Type: Bug Reporter: Ted Yu See http://search-hadoop.com/m/q3RTtjpe8r1iRbTj2 for discussion. Summary: addition of @VisibleForTesting annotation resulted in spark-shell malfunctioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called
[ https://issues.apache.org/jira/browse/SPARK-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-11572: --- Component/s: Spark Core > Exit AsynchronousListenerBus thread when stop() is called > - > > Key: SPARK-11572 > URL: https://issues.apache.org/jira/browse/SPARK-11572 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu > > As vonnagy reported in the following thread: > http://search-hadoop.com/m/q3RTtk982kvIow22 > Attempts to join the thread in AsynchronousListenerBus resulted in lock up > because AsynchronousListenerBus thread was still getting messages > SparkListenerExecutorMetricsUpdate from the DAGScheduler > Proposed fix is to check stopped flag within the loop of > AsynchronousListenerBus thread -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11572) Exit AsynchronousListenerBus thread when stop() is called
Ted Yu created SPARK-11572: -- Summary: Exit AsynchronousListenerBus thread when stop() is called Key: SPARK-11572 URL: https://issues.apache.org/jira/browse/SPARK-11572 Project: Spark Issue Type: Bug Reporter: Ted Yu As vonnagy reported in the following thread: http://search-hadoop.com/m/q3RTtk982kvIow22 Attempts to join the thread in AsynchronousListenerBus resulted in lock up because AsynchronousListenerBus thread was still getting messages SparkListenerExecutorMetricsUpdate from the DAGScheduler Proposed fix is to check stopped flag within the loop of AsynchronousListenerBus thread -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator
[ https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985438#comment-14985438 ] Ted Yu commented on SPARK-11371: [~rxin] [~yhuai]: Your comment is welcome. > Make "mean" an alias for "avg" operator > --- > > Key: SPARK-11371 > URL: https://issues.apache.org/jira/browse/SPARK-11371 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11371-v1.patch > > > From Reynold in the thread 'Exception when using some aggregate operators' > (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(col)". > We can, however, make "mean" an alias for "avg" to improve compatibility > between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator
[ https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985424#comment-14985424 ] Ted Yu commented on SPARK-11371: [~sowen]: Do you think it is worth adding the alias ? Thanks > Make "mean" an alias for "avg" operator > --- > > Key: SPARK-11371 > URL: https://issues.apache.org/jira/browse/SPARK-11371 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11371-v1.patch > > > From Reynold in the thread 'Exception when using some aggregate operators' > (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(col)". > We can, however, make "mean" an alias for "avg" to improve compatibility > between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11442) Reduce numSlices for local metrics test of SparkListenerSuite
Ted Yu created SPARK-11442: -- Summary: Reduce numSlices for local metrics test of SparkListenerSuite Key: SPARK-11442 URL: https://issues.apache.org/jira/browse/SPARK-11442 Project: Spark Issue Type: Test Components: Tests Reporter: Ted Yu Priority: Minor In the thread, http://search-hadoop.com/m/q3RTtcQiFSlTxeP/test+failed+due+to+OOME&subj=test+failed+due+to+OOME, it was discussed that memory consumption for SparkListenerSuite should be brought down. This is an attempt in that direction by reducing numSlices for local metrics test. Before change: Run completed in 57 seconds, 357 milliseconds. Reducing numSlices to 16 results in: Run completed in 44 seconds, 115 milliseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11435) Stop SparkContext at the end of subtest in SparkListenerSuite
[ https://issues.apache.org/jira/browse/SPARK-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984422#comment-14984422 ] Ted Yu commented on SPARK-11435: LocalSparkContext would close the SparkContext. > Stop SparkContext at the end of subtest in SparkListenerSuite > - > > Key: SPARK-11435 > URL: https://issues.apache.org/jira/browse/SPARK-11435 > Project: Spark > Issue Type: Improvement > Components: Tests >Reporter: Ted Yu >Priority: Minor > > Some subtests in SparkListenerSuite creates SparkContext without stopping it > explicitly upon completion of the subtest. > This issue is to stop SparkContext explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11435) Stop SparkContext at the end of subtest in SparkListenerSuite
Ted Yu created SPARK-11435: -- Summary: Stop SparkContext at the end of subtest in SparkListenerSuite Key: SPARK-11435 URL: https://issues.apache.org/jira/browse/SPARK-11435 Project: Spark Issue Type: Improvement Components: Tests Reporter: Ted Yu Priority: Minor Some subtests in SparkListenerSuite creates SparkContext without stopping it explicitly upon completion of the subtest. This issue is to stop SparkContext explicitly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983392#comment-14983392 ] Ted Yu commented on SPARK-11348: Without this change, make-distribution.sh is likely to bump into issue for master branch. > Replace addOnCompleteCallback with addTaskCompletionListener() in > UnsafeExternalSorter > -- > > Key: SPARK-11348 > URL: https://issues.apache.org/jira/browse/SPARK-11348 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11348.txt > > > When practicing the command from SPARK-11318, I got the following: > {code} > [WARNING] > /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15] > [deprecation] > addOnCompleteCallback(Function0) in TaskContext has been deprecated > {code} > addOnCompleteCallback should be replaced with addTaskCompletionListener() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator
[ https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981887#comment-14981887 ] Ted Yu commented on SPARK-11371: That's true. > Make "mean" an alias for "avg" operator > --- > > Key: SPARK-11371 > URL: https://issues.apache.org/jira/browse/SPARK-11371 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11371-v1.patch > > > From Reynold in the thread 'Exception when using some aggregate operators' > (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(col)". > We can, however, make "mean" an alias for "avg" to improve compatibility > between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11371) Make "mean" an alias for "avg" operator
[ https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980484#comment-14980484 ] Ted Yu commented on SPARK-11371: Since I cannot assign the JIRA to myself, attaching patch shows my intention working on the JIRA. The background is that I wanted to open 3 PRs as of yerterday but I don't have as many email addresses (forked repo's, i.e.). I am more than willing to learn from experts how multiple outstanding PRs are managed. As for the mean alias, I quoted Reynold's response. I am open to discussion on whether this would ultimately go through. > Make "mean" an alias for "avg" operator > --- > > Key: SPARK-11371 > URL: https://issues.apache.org/jira/browse/SPARK-11371 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11371-v1.patch > > > From Reynold in the thread 'Exception when using some aggregate operators' > (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(col)". > We can, however, make "mean" an alias for "avg" to improve compatibility > between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11371) Make "mean" an alias for "avg" operator
[ https://issues.apache.org/jira/browse/SPARK-11371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-11371: --- Attachment: spark-11371-v1.patch > Make "mean" an alias for "avg" operator > --- > > Key: SPARK-11371 > URL: https://issues.apache.org/jira/browse/SPARK-11371 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11371-v1.patch > > > From Reynold in the thread 'Exception when using some aggregate operators' > (http://search-hadoop.com/m/q3RTt0xFr22nXB4/): > I don't think these are bugs. The SQL standard for average is "avg", not > "mean". Similarly, a distinct count is supposed to be written as > "count(distinct col)", not "countDistinct(col)". > We can, however, make "mean" an alias for "avg" to improve compatibility > between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11371) Make "mean" an alias for "avg" operator
Ted Yu created SPARK-11371: -- Summary: Make "mean" an alias for "avg" operator Key: SPARK-11371 URL: https://issues.apache.org/jira/browse/SPARK-11371 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor >From Reynold in the thread 'Exception when using some aggregate operators' >(http://search-hadoop.com/m/q3RTt0xFr22nXB4/): I don't think these are bugs. The SQL standard for average is "avg", not "mean". Similarly, a distinct count is supposed to be written as "count(distinct col)", not "countDistinct(col)". We can, however, make "mean" an alias for "avg" to improve compatibility between DataFrame and SQL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-11348: --- Priority: Minor (was: Trivial) > Replace addOnCompleteCallback with addTaskCompletionListener() in > UnsafeExternalSorter > -- > > Key: SPARK-11348 > URL: https://issues.apache.org/jira/browse/SPARK-11348 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Attachments: spark-11348.txt > > > When practicing the command from SPARK-11318, I got the following: > {code} > [WARNING] > /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15] > [deprecation] > addOnCompleteCallback(Function0) in TaskContext has been deprecated > {code} > addOnCompleteCallback should be replaced with addTaskCompletionListener() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter
[ https://issues.apache.org/jira/browse/SPARK-11348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-11348: --- Attachment: spark-11348.txt > Replace addOnCompleteCallback with addTaskCompletionListener() in > UnsafeExternalSorter > -- > > Key: SPARK-11348 > URL: https://issues.apache.org/jira/browse/SPARK-11348 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu >Priority: Trivial > Attachments: spark-11348.txt > > > When practicing the command from SPARK-11318, I got the following: > {code} > [WARNING] > /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15] > [deprecation] > addOnCompleteCallback(Function0) in TaskContext has been deprecated > {code} > addOnCompleteCallback should be replaced with addTaskCompletionListener() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11348) Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter
Ted Yu created SPARK-11348: -- Summary: Replace addOnCompleteCallback with addTaskCompletionListener() in UnsafeExternalSorter Key: SPARK-11348 URL: https://issues.apache.org/jira/browse/SPARK-11348 Project: Spark Issue Type: Bug Reporter: Ted Yu Priority: Trivial When practicing the command from SPARK-11318, I got the following: {code} [WARNING] /home/hbase/spark/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[141,15] [deprecation] addOnCompleteCallback(Function0) in TaskContext has been deprecated {code} addOnCompleteCallback should be replaced with addTaskCompletionListener() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11318) [DOC] Include hive profile in make-distribution.sh command
Ted Yu created SPARK-11318: -- Summary: [DOC] Include hive profile in make-distribution.sh command Key: SPARK-11318 URL: https://issues.apache.org/jira/browse/SPARK-11318 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor The tgz I built using the current command shown in building-spark.html does not produce the datanucleus jars which are included in the "boxed" spark distributions. hive profile should be included so that the tar ball matches spark distribution. See 'Problem with make-distribution.sh' thread on user@ for background. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11286) Make Outbox stopped exception singleton
[ https://issues.apache.org/jira/browse/SPARK-11286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-11286. Resolution: Won't Fix > Make Outbox stopped exception singleton > --- > > Key: SPARK-11286 > URL: https://issues.apache.org/jira/browse/SPARK-11286 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu >Priority: Trivial > > In two places in Outbox.scala , new SparkException is created for Outbox > stopped condition. > Create a singleton for Outbox stopped exception and use it instead of > creating exception every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11286) Make Outbox stopped exception singleton
Ted Yu created SPARK-11286: -- Summary: Make Outbox stopped exception singleton Key: SPARK-11286 URL: https://issues.apache.org/jira/browse/SPARK-11286 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Trivial In two places in Outbox.scala , new SparkException is created for Outbox stopped condition. Create a singleton for Outbox stopped exception and use it instead of creating exception every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11172) Close JsonParser/Generator in test
Ted Yu created SPARK-11172: -- Summary: Close JsonParser/Generator in test Key: SPARK-11172 URL: https://issues.apache.org/jira/browse/SPARK-11172 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Trivial JsonParser / Generator created in test should be closed. This is in continuation to SPARK-11124 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager
[ https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954202#comment-14954202 ] Ted Yu commented on SPARK-10985: I am bit confused by the assignment of this JIRA. Normally assignee is 'Apache Spark' before a pull request comes up. It has been at least 2 days but I don't see PR. Did I miss something ? > Avoid passing evicted blocks throughout BlockManager / CacheManager > --- > > Key: SPARK-10985 > URL: https://issues.apache.org/jira/browse/SPARK-10985 > Project: Spark > Issue Type: Sub-task > Components: Block Manager, Spark Core >Reporter: Andrew Or >Assignee: Bowen Zhang >Priority: Minor > > This is a minor refactoring task. > Currently when we attempt to put a block in, we get back an array buffer of > blocks that are dropped in the process. We do this to propagate these blocks > back to our TaskContext, which will add them to its TaskMetrics so we can see > them in the SparkUI storage tab properly. > Now that we have TaskContext.get, we can just use that to propagate this > information. This simplifies a lot of the signatures and gets rid of weird > return types like the following everywhere: > {code} > ArrayBuffer[(BlockId, BlockStatus)] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11048) Use ForkJoinPool as executorService
[ https://issues.apache.org/jira/browse/SPARK-11048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-11048. Resolution: Won't Fix > Use ForkJoinPool as executorService > --- > > Key: SPARK-11048 > URL: https://issues.apache.org/jira/browse/SPARK-11048 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > ForkJoinPool: threads are created only if there are waiting tasks. They > expire after 2seconds (it's > hardcoded in the jdk code). > ForkJoinPool is better than ThreadPoolExecutor > It's available in the JDK 1.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11048) Use ForkJoinPool as executorService
Ted Yu created SPARK-11048: -- Summary: Use ForkJoinPool as executorService Key: SPARK-11048 URL: https://issues.apache.org/jira/browse/SPARK-11048 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor ForkJoinPool: threads are created only if there are waiting tasks. They expire after 2seconds (it's hardcoded in the jdk code). ForkJoinPool is better than ThreadPoolExecutor It's available in the JDK 1.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor
Ted Yu created SPARK-11006: -- Summary: Rename NullColumnAccess as NullColumnAccessor Key: SPARK-11006 URL: https://issues.apache.org/jira/browse/SPARK-11006 Project: Spark Issue Type: Task Reporter: Ted Yu Priority: Trivial In sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala , NullColumnAccess should be renmaed as NullColumnAccessor so that same convention is adhered to for the accessors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-4066: -- Description: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. was: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10787: --- Priority: Major (was: Minor) > Consider replacing ObjectOutputStream for serialization to prevent OOME > --- > > Key: SPARK-10787 > URL: https://issues.apache.org/jira/browse/SPARK-10787 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu > > In the thread, Spark ClosureCleaner or java serializer OOM when trying to > grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that > ClosureCleaner#ensureSerializable() resulted in OOME. > The cause was that ObjectOutputStream keeps a strong reference of every > object that was written to it. > This issue tries to avoid OOME by considering alternative to > ObjectOutputStream -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10787: --- Summary: Consider replacing ObjectOutputStream for serialization to prevent OOME (was: Reset ObjectOutputStream more often to prevent OOME) > Consider replacing ObjectOutputStream for serialization to prevent OOME > --- > > Key: SPARK-10787 > URL: https://issues.apache.org/jira/browse/SPARK-10787 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > In the thread, Spark ClosureCleaner or java serializer OOM when trying to > grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that > ClosureCleaner#ensureSerializable() resulted in OOME. > The cause was that ObjectOutputStream keeps a strong reference of every > object that was written to it. > This issue tries to avoid OOME by calling reset() more often. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10787) Consider replacing ObjectOutputStream for serialization to prevent OOME
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10787: --- Description: In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME. The cause was that ObjectOutputStream keeps a strong reference of every object that was written to it. This issue tries to avoid OOME by considering alternative to ObjectOutputStream was: In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME. The cause was that ObjectOutputStream keeps a strong reference of every object that was written to it. This issue tries to avoid OOME by calling reset() more often. > Consider replacing ObjectOutputStream for serialization to prevent OOME > --- > > Key: SPARK-10787 > URL: https://issues.apache.org/jira/browse/SPARK-10787 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > In the thread, Spark ClosureCleaner or java serializer OOM when trying to > grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that > ClosureCleaner#ensureSerializable() resulted in OOME. > The cause was that ObjectOutputStream keeps a strong reference of every > object that was written to it. > This issue tries to avoid OOME by considering alternative to > ObjectOutputStream -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934292#comment-14934292 ] Ted Yu commented on SPARK-10787: I can think of two approaches: 1. Cloning ObjectOutputStream using weak references. We need to consider license. Also, ObjectOutputStream may reference Java internal methods / fields. This makes maintaining the clone difficult. 2. Switch to Kryo-based serialization. > Reset ObjectOutputStream more often to prevent OOME > --- > > Key: SPARK-10787 > URL: https://issues.apache.org/jira/browse/SPARK-10787 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > In the thread, Spark ClosureCleaner or java serializer OOM when trying to > grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that > ClosureCleaner#ensureSerializable() resulted in OOME. > The cause was that ObjectOutputStream keeps a strong reference of every > object that was written to it. > This issue tries to avoid OOME by calling reset() more often. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-4066: -- Description: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. was: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
[ https://issues.apache.org/jira/browse/SPARK-10561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10561: --- Description: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. was: Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. > Provide tooling for auto-generating Spark SQL reference manual > -- > > Key: SPARK-10561 > URL: https://issues.apache.org/jira/browse/SPARK-10561 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Reporter: Ted Yu > > Here is the discussion thread: > http://search-hadoop.com/m/q3RTtcD20F1o62xE > Richard Hillegas made the following suggestion: > A machine-generated BNF, however, is easy to imagine. But perhaps not so easy > to implement. Spark's SQL grammar is implemented in Scala, extending the DSL > support provided by the Scala language. I am new to programming in Scala, so > I don't know whether the Scala ecosystem provides any good tools for > reverse-engineering a BNF from a class which extends > scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME
[ https://issues.apache.org/jira/browse/SPARK-10787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-10787: --- Description: In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME. The cause was that ObjectOutputStream keeps a strong reference of every object that was written to it. This issue tries to avoid OOME by calling reset() more often. was: In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow, Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME. The cause was that ObjectOutputStream keeps a strong reference of every object that was written to it. This issue tries to avoid OOME by calling reset() more often. > Reset ObjectOutputStream more often to prevent OOME > --- > > Key: SPARK-10787 > URL: https://issues.apache.org/jira/browse/SPARK-10787 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > In the thread, Spark ClosureCleaner or java serializer OOM when trying to > grow (http://search-hadoop.com/m/q3RTtAr5X543dNn), Jay Luan reported that > ClosureCleaner#ensureSerializable() resulted in OOME. > The cause was that ObjectOutputStream keeps a strong reference of every > object that was written to it. > This issue tries to avoid OOME by calling reset() more often. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10787) Reset ObjectOutputStream more often to prevent OOME
Ted Yu created SPARK-10787: -- Summary: Reset ObjectOutputStream more often to prevent OOME Key: SPARK-10787 URL: https://issues.apache.org/jira/browse/SPARK-10787 Project: Spark Issue Type: Bug Reporter: Ted Yu In the thread, Spark ClosureCleaner or java serializer OOM when trying to grow, Jay Luan reported that ClosureCleaner#ensureSerializable() resulted in OOME. The cause was that ObjectOutputStream keeps a strong reference of every object that was written to it. This issue tries to avoid OOME by calling reset() more often. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10721) Log warning when file deletion fails
Ted Yu created SPARK-10721: -- Summary: Log warning when file deletion fails Key: SPARK-10721 URL: https://issues.apache.org/jira/browse/SPARK-10721 Project: Spark Issue Type: Bug Reporter: Ted Yu Priority: Minor There're several places in the code base where return value from File.delete() is ignored. This issue adds checking for the boolean return value and logs warning when deletion fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10701) Expose SparkContext#stopped flag with @DeveloperApi
Ted Yu created SPARK-10701: -- Summary: Expose SparkContext#stopped flag with @DeveloperApi Key: SPARK-10701 URL: https://issues.apache.org/jira/browse/SPARK-10701 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor SPARK-9522 added stopped flag as private[spark]. See this thread: http://search-hadoop.com/m/q3RTtqvncy17sSTx1 We should expose this flag to developers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10561) Provide tooling for auto-generating Spark SQL reference manual
Ted Yu created SPARK-10561: -- Summary: Provide tooling for auto-generating Spark SQL reference manual Key: SPARK-10561 URL: https://issues.apache.org/jira/browse/SPARK-10561 Project: Spark Issue Type: Improvement Reporter: Ted Yu Here is the discussion thread: http://search-hadoop.com/m/q3RTtcD20F1o62xE Richard Hillegas made the following suggestion: A machine-generated BNF, however, is easy to imagine. But perhaps not so easy to implement. Spark's SQL grammar is implemented in Scala, extending the DSL support provided by the Scala language. I am new to programming in Scala, so I don't know whether the Scala ecosystem provides any good tools for reverse-engineering a BNF from a class which extends scala.util.parsing.combinator.syntactical.StandardTokenParsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10546) Check partitionId's range in ExternalSorter#spill()
Ted Yu created SPARK-10546: -- Summary: Check partitionId's range in ExternalSorter#spill() Key: SPARK-10546 URL: https://issues.apache.org/jira/browse/SPARK-10546 Project: Spark Issue Type: Improvement Affects Versions: 1.4.1 Reporter: Ted Yu Priority: Minor See this thread for background: http://search-hadoop.com/m/q3RTt0rWvIkHAE81 We should check the range of partition Id and provide meaningful message through exception. Alternatively, we can use abs() and modulo to force the partition Id into legitimate range. However, expectation is that user should correct the logic error in his / her code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-4066: -- Description: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. was: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10074) Include Float in @specialized annotation
[ https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved SPARK-10074. Resolution: Later Until the need comes. > Include Float in @specialized annotation > > > Key: SPARK-10074 > URL: https://issues.apache.org/jira/browse/SPARK-10074 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > There're several places in Spark codebase where we use @specialized > annotation covering Long and Double. > e.g. in OpenHashMap.scala : > {code} > class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag]( > initialCapacity: Int) > {code} > Float should be added to @specialized annotation as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10074) Include Float in @specialized annotation
[ https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701388#comment-14701388 ] Ted Yu commented on SPARK-10074: I would argue that Double should be taken out from MutablePair (and other pertinent classes) before there is use for it. > Include Float in @specialized annotation > > > Key: SPARK-10074 > URL: https://issues.apache.org/jira/browse/SPARK-10074 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > There're several places in Spark codebase where we use @specialized > annotation covering Long and Double. > e.g. in OpenHashMap.scala : > {code} > class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag]( > initialCapacity: Int) > {code} > Float should be added to @specialized annotation as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10074) Include Float in @specialized annotation
[ https://issues.apache.org/jira/browse/SPARK-10074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14701332#comment-14701332 ] Ted Yu commented on SPARK-10074: I performed the following search in Spark codebase: find . -name '*.scala' -exec grep 'MutablePair.*Double' {} \; -print There is only one match: {code} case class MutablePair[@specialized(Int, Long, Double, Char, Boolean/* , AnyRef */) T1, ./core/src/main/scala/org/apache/spark/util/MutablePair.scala {code} I think adding Float would provide parity with Double, potentially benefiting future use. > Include Float in @specialized annotation > > > Key: SPARK-10074 > URL: https://issues.apache.org/jira/browse/SPARK-10074 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > There're several places in Spark codebase where we use @specialized > annotation covering Long and Double. > e.g. in OpenHashMap.scala : > {code} > class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag]( > initialCapacity: Int) > {code} > Float should be added to @specialized annotation as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10074) Include Float in @specialized annotation
Ted Yu created SPARK-10074: -- Summary: Include Float in @specialized annotation Key: SPARK-10074 URL: https://issues.apache.org/jira/browse/SPARK-10074 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor There're several places in Spark codebase where we use @specialized annotation covering Long and Double. e.g. in OpenHashMap.scala : {code} class OpenHashMap[K : ClassTag, @specialized(Long, Int, Double) V: ClassTag]( initialCapacity: Int) {code} Float should be added to @specialized annotation as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9446) Clear Active SparkContext in stop() method
Ted Yu created SPARK-9446: - Summary: Clear Active SparkContext in stop() method Key: SPARK-9446 URL: https://issues.apache.org/jira/browse/SPARK-9446 Project: Spark Issue Type: Bug Reporter: Ted Yu In thread 'stopped SparkContext remaining active' on mailing list, Andres observed the following in driver log: {code} 15/07/29 15:17:09 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: 15/07/29 15:17:09 INFO YarnClientSchedulerBackend: Shutting down all executors Exception in thread "Yarn application state monitor" org.apache.spark.SparkException: Error asking standalone scheduler to shut down executors at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:261) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:158) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1411) at org.apache.spark.SparkContext.stop(SparkContext.scala:1644) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:139) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190)15/07/29 15:17:09 INFO YarnClientSchedulerBackend: Asking each executor to shut down at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257) ... 6 more {code} Effect of the above exception is that a stopped SparkContext is returned to user since SparkContext.clearActiveContext() is not called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8336) Fix NullPointerException with functions.rand()
Ted Yu created SPARK-8336: - Summary: Fix NullPointerException with functions.rand() Key: SPARK-8336 URL: https://issues.apache.org/jira/browse/SPARK-8336 Project: Spark Issue Type: Bug Reporter: Ted Yu The problem was first reported by Justin Yip in the thread 'NullPointerException with functions.rand()' Here is how to reproduce the problem: {code} sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7853) ClassNotFoundException for SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-7853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558316#comment-14558316 ] Ted Yu commented on SPARK-7853: --- Subject says ClassNotFoundException. Which class couldn't be found ? > ClassNotFoundException for SparkSQL > --- > > Key: SPARK-7853 > URL: https://issues.apache.org/jira/browse/SPARK-7853 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 >Reporter: Cheng Hao >Priority: Blocker > > Reproduce steps: > {code} > bin/spark-sql --jars ./sql/data/files/TestSerDe.jar > spark-sql> CREATE TABLE alter1(a INT, b INT) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.TestSerDe'; > {code} > Throws Exception like: > {panel} > 15/05/25 01:33:35 ERROR thriftserver.SparkSQLDriver: Failed in [CREATE TABLE > alter1(a INT, b INT) ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.TestSerDe'] > org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution > Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot > validate serde: org.apache.hadoop.hive.serde2.TestSerDe > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333) > at > org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310) > at > org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139) > at > org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310) > at > org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300) > at > org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457) > at > org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922) > at > org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:147) > at org.apache.spark.sql.DataFrame.(DataFrame.scala:131) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727) > at > org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:283) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:218) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7538) Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge
[ https://issues.apache.org/jira/browse/SPARK-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540083#comment-14540083 ] Ted Yu commented on SPARK-7538: --- As mentioned by Cody Koeninger on the mailing list, using spark-streaming-kafka-assembly_2.10:1.3.1 would resolve the issue: {code} $ jar tvf ~/Downloads/spark-streaming-kafka-assembly_2.10-1.3.1.jar | grep yammer | grep Gauge 1329 Sat Apr 11 04:25:50 PDT 2015 com/yammer/metrics/core/Gauge.class {code} > Kafka stream fails: java.lang.NoClassDefFound com/yammer/metrics/core/Gauge > --- > > Key: SPARK-7538 > URL: https://issues.apache.org/jira/browse/SPARK-7538 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1 > Environment: Ubuntu 14.04 LTS > java version "1.7.0_79" > OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) > OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode) > Spark 1.3.1 release. >Reporter: Lee McFadden > > We have a simple streaming job, the components of which work fine in a batch > environment reading from a cassandra table as the source. > We adapted it to work with streaming using the Python libs. > Submit command line: > {code} > /home/ubuntu/spark/spark-1.3.1/bin/spark-submit \ > --packages > TargetHolding/pyspark-cassandra:0.1.4,org.apache.spark:spark-streaming-kafka_2.10:1.3.1 > \ > --conf > spark.cassandra.connection.host=10.10.103.172,10.10.102.160,10.10.101.79 \ > --master spark://127.0.0.1:7077 \ > affected_hosts.py > {code} > When we run the streaming job everything starts just fine, then we see the > following in the logs: > {code} > 15/05/11 19:50:46 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 70, > ip-10-10-102-53.us-west-2.compute.internal): java.lang.NoClassDefFoundError: > com/yammer/metrics/core/Gauge > at > kafka.consumer.ZookeeperConsumerConnector.createFetcher(ZookeeperConsumerConnector.scala:151) > at > kafka.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:115) > at > kafka.consumer.ZookeeperConsumerConnector.(ZookeeperConsumerConnector.scala:128) > at kafka.consumer.Consumer$.create(ConsumerConnector.scala:89) > at > org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:100) > at > org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121) > at > org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106) > at > org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:298) > at > org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$8.apply(ReceiverTracker.scala:290) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:64) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: com.yammer.metrics.core.Gauge > at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 17 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-4066: -- Description: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. was: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7450) Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Ted Yu created SPARK-7450: - Summary: Use UNSAFE.getLong() to speed up BitSetMethods#anySet() Key: SPARK-7450 URL: https://issues.apache.org/jira/browse/SPARK-7450 Project: Spark Issue Type: Improvement Reporter: Ted Yu Currently BitSetMethods#anySet() traverses BitSet in bytes. We can use UNSAFE.getLong() for speed up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5041) hive-exec jar should be generated with JDK 6
[ https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523503#comment-14523503 ] Ted Yu commented on SPARK-5041: --- Considering '[discuss] ending support for Java 6' discussion on mailing list, looks like there is no need to do this anymore. > hive-exec jar should be generated with JDK 6 > > > Key: SPARK-5041 > URL: https://issues.apache.org/jira/browse/SPARK-5041 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ted Yu > Labels: jdk1.7, maven > > Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar > cannot be used by Spark program running JDK 6. > See http://search-hadoop.com/m/JW1q5YLCNN > hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be > generated with JDK 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-4066: -- Description: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. was: Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu >Priority: Minor > Labels: style > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7107) Add parameter for zookeeper.znode.parent to hbase_inputformat.py
[ https://issues.apache.org/jira/browse/SPARK-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-7107: -- Description: [~yeshavora] first reported encountering the following exception running hbase_inputformat.py : {code} py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) {code} It turned out that the hbase cluster has custom znode parent: {code} zookeeper.znode.parent /hbase-unsecure {code} hbase_inputformat.py should support specification of custom znode parent. was: We encountered the following exception running hbase_inputformat.py : {code} py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) {code} It turned out that the hbase cluster has custom znode parent: {code} zookeeper.znode.parent /hbase-unsecure {code} hbase_inputformat.py should support specification of custom znode parent. > Add parameter for zookeeper.znode.parent to hbase_inputformat.py > > > Key: SPARK-7107 > URL: https://issues.apache.org/jira/browse/SPARK-7107 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > [~yeshavora] first reported encountering the following exception running > hbase_inputformat.py : > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. > : java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208) > at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313) > at > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288) > at > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) > {code} > It turned out that the hbase cluster has custom znode parent: > {code} > > zookeeper.znode.parent > /hbase-unsecure > > {code} > hbase_inputformat.py should support specification of custom znode parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7107) Add parameter for zookeeper.znode.parent to hbase_inputformat.py
Ted Yu created SPARK-7107: - Summary: Add parameter for zookeeper.znode.parent to hbase_inputformat.py Key: SPARK-7107 URL: https://issues.apache.org/jira/browse/SPARK-7107 Project: Spark Issue Type: Bug Reporter: Ted Yu Priority: Minor We encountered the following exception running hbase_inputformat.py : {code} py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:313) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:288) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) {code} It turned out that the hbase cluster has custom znode parent: {code} zookeeper.znode.parent /hbase-unsecure {code} hbase_inputformat.py should support specification of custom znode parent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Description: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 was: floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6085) Increase default value for memory overhead
[ https://issues.apache.org/jira/browse/SPARK-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342398#comment-14342398 ] Ted Yu commented on SPARK-6085: --- In my opinion, priority for this JIRA should be Major. Users who deploy Spark on YARN in production are highly likely to hit computation failure(s). This would impact their business. Without intimate knowledge of Spark, it would take them some time to figure out the root cause. > Increase default value for memory overhead > -- > > Key: SPARK-6085 > URL: https://issues.apache.org/jira/browse/SPARK-6085 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Ted Yu >Priority: Minor > > Several users have communicated how current default memory overhead value > resulted in failed computation in Spark on YARN. > See this thread: > http://search-hadoop.com/m/JW1q58FDel > Increasing default value for memory overhead would improve out of box user > experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6085) Increase default value for memory overhead
Ted Yu created SPARK-6085: - Summary: Increase default value for memory overhead Key: SPARK-6085 URL: https://issues.apache.org/jira/browse/SPARK-6085 Project: Spark Issue Type: Improvement Reporter: Ted Yu Several users have communicated how current default memory overhead value resulted in failed computation in Spark on YARN. See this thread: http://search-hadoop.com/m/JW1q58FDel Increasing default value for memory overhead would improve out of box user experience. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5427) Add support for floor function in Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5427: -- Labels: math (was: ) > Add support for floor function in Spark SQL > --- > > Key: SPARK-5427 > URL: https://issues.apache.org/jira/browse/SPARK-5427 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Ted Yu > Labels: math > > floor() function is supported in Hive SQL. > This issue is to add floor() function to Spark SQL. > Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
[ https://issues.apache.org/jira/browse/SPARK-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339169#comment-14339169 ] Ted Yu commented on SPARK-6045: --- The logic of CassandraHadoopMigrator.scala is unknown. With the current PR, user would be able to see exception earlier so that he / she can perform proper analysis. > RecordWriter should be checked against null in > PairRDDFunctions#saveAsNewAPIHadoopDataset > - > > Key: SPARK-6045 > URL: https://issues.apache.org/jira/browse/SPARK-6045 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > gtinside reported in the thread 'NullPointerException in TaskSetManager' with > the following stack trace: > {code} > WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost > task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com): > java.lang.NullPointerException > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007) > com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77) > com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Looks like the following call in finally block was the cause: > {code} > writer.close(hadoopContext) > {code} > We should check writer against null before calling close(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
[ https://issues.apache.org/jira/browse/SPARK-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339147#comment-14339147 ] Ted Yu commented on SPARK-6045: --- https://github.com/apache/spark/pull/4794 > RecordWriter should be checked against null in > PairRDDFunctions#saveAsNewAPIHadoopDataset > - > > Key: SPARK-6045 > URL: https://issues.apache.org/jira/browse/SPARK-6045 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > > gtinside reported in the thread 'NullPointerException in TaskSetManager' with > the following stack trace: > {code} > WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost > task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com): > java.lang.NullPointerException > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007) > com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77) > com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala) > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Looks like the following call in finally block was the cause: > {code} > writer.close(hadoopContext) > {code} > We should check writer against null before calling close(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6045) RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset
Ted Yu created SPARK-6045: - Summary: RecordWriter should be checked against null in PairRDDFunctions#saveAsNewAPIHadoopDataset Key: SPARK-6045 URL: https://issues.apache.org/jira/browse/SPARK-6045 Project: Spark Issue Type: Bug Reporter: Ted Yu gtinside reported in the thread 'NullPointerException in TaskSetManager' with the following stack trace: {code} WARN 2015-02-26 14:21:43,217 [task-result-getter-0] TaskSetManager - Lost task 14.2 in stage 0.0 (TID 29, devntom003.dev.blackrock.com): java.lang.NullPointerException org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1007) com.bfm.spark.test.CassandraHadoopMigrator$.main(CassandraHadoopMigrator.scala:77) com.bfm.spark.test.CassandraHadoopMigrator.main(CassandraHadoopMigrator.scala) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} Looks like the following call in finally block was the cause: {code} writer.close(hadoopContext) {code} We should check writer against null before calling close(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5427) Add support for floor function in Spark SQL
Ted Yu created SPARK-5427: - Summary: Add support for floor function in Spark SQL Key: SPARK-5427 URL: https://issues.apache.org/jira/browse/SPARK-5427 Project: Spark Issue Type: Improvement Reporter: Ted Yu floor() function is supported in Hive SQL. This issue is to add floor() function to Spark SQL. Related thread: http://search-hadoop.com/m/JW1q563fc22 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler
[ https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286416#comment-14286416 ] Ted Yu commented on SPARK-1714: --- allocatedHostToContainersMap.synchronized is absent for the following operation in runAllocatedContainers(): {code} val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname, new HashSet[ContainerId]) containerSet += containerId allocatedContainerToHostMap.put(containerId, executorHostname) {code} Is that intentional ? > Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler > > > Key: SPARK-1714 > URL: https://issues.apache.org/jira/browse/SPARK-1714 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler
[ https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286407#comment-14286407 ] Ted Yu commented on SPARK-1714: --- {code} if (completedContainer.getExitStatus == -103) { // vmem limit exceeded {code} Should ContainerExitStatus#KILLED_EXCEEDED_VMEM be referenced above ? > Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler > > > Key: SPARK-1714 > URL: https://issues.apache.org/jira/browse/SPARK-1714 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5041) hive-exec jar should be generated with JDK 6
[ https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5041: -- Labels: jdk1.7 maven (was: maven) > hive-exec jar should be generated with JDK 6 > > > Key: SPARK-5041 > URL: https://issues.apache.org/jira/browse/SPARK-5041 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > Labels: jdk1.7, maven > > Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar > cannot be used by Spark program running JDK 6. > See http://search-hadoop.com/m/JW1q5YLCNN > hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be > generated with JDK 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5041) hive-exec jar should be generated with JDK 6
[ https://issues.apache.org/jira/browse/SPARK-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-5041: -- Labels: maven (was: ) > hive-exec jar should be generated with JDK 6 > > > Key: SPARK-5041 > URL: https://issues.apache.org/jira/browse/SPARK-5041 > Project: Spark > Issue Type: Bug >Reporter: Ted Yu > Labels: maven > > Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar > cannot be used by Spark program running JDK 6. > See http://search-hadoop.com/m/JW1q5YLCNN > hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be > generated with JDK 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5041) hive-exec jar should be generated with JDK 6
Ted Yu created SPARK-5041: - Summary: hive-exec jar should be generated with JDK 6 Key: SPARK-5041 URL: https://issues.apache.org/jira/browse/SPARK-5041 Project: Spark Issue Type: Bug Reporter: Ted Yu Shixiong Zhu first reported the issue where hive-exec-0.12.0-protobuf-2.5.jar cannot be used by Spark program running JDK 6. See http://search-hadoop.com/m/JW1q5YLCNN hive-exec-0.12.0-protobuf-2.5.jar was generated with JDK 7. It should be generated with JDK 6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1127) Add saveAsHBase to PairRDDFunctions
[ https://issues.apache.org/jira/browse/SPARK-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1423#comment-1423 ] Ted Yu commented on SPARK-1127: --- According to Reynold, First half of the external data source API (for reading but not writing) is already in 1.2: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala > Add saveAsHBase to PairRDDFunctions > --- > > Key: SPARK-1127 > URL: https://issues.apache.org/jira/browse/SPARK-1127 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: haosdent huang >Assignee: haosdent huang > Fix For: 1.2.0 > > > Support to save data in HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4455) Exclude dependency on hbase-annotations module
Ted Yu created SPARK-4455: - Summary: Exclude dependency on hbase-annotations module Key: SPARK-4455 URL: https://issues.apache.org/jira/browse/SPARK-4455 Project: Spark Issue Type: Bug Reporter: Ted Yu As Patrick mentioned in the thread 'Has anyone else observed this build break?' : The error I've seen is this when building the examples project: {code} spark-examples_2.10: Could not resolve dependencies for project org.apache.spark:spark-examples_2.10:jar:1.2.0-SNAPSHOT: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar {code} The reason for this error is that hbase-annotations is using a "system" scoped dependency in their hbase-annotations pom, and this doesn't work with certain JDK layouts such as that provided on Mac OS: http://central.maven.org/maven2/org/apache/hbase/hbase-annotations/0.98.7-hadoop2/hbase-annotations-0.98.7-hadoop2.pom hbase-annotations module is transitively brought in through other HBase modules, we should exclude it from related modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198701#comment-14198701 ] Ted Yu commented on SPARK-1297: --- Create a new pull request: https://github.com/apache/spark/pull/3115 > Upgrade HBase dependency to 0.98.0 > -- > > Key: SPARK-1297 > URL: https://issues.apache.org/jira/browse/SPARK-1297 > Project: Spark > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, > spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt > > > HBase 0.94.6 was released 11 months ago. > Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198510#comment-14198510 ] Ted Yu commented on SPARK-1297: --- Patch v7 uses 0.98.7 hbase release > Upgrade HBase dependency to 0.98.0 > -- > > Key: SPARK-1297 > URL: https://issues.apache.org/jira/browse/SPARK-1297 > Project: Spark > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, > spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt > > > HBase 0.94.6 was released 11 months ago. > Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0
[ https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated SPARK-1297: -- Attachment: spark-1297-v7.txt > Upgrade HBase dependency to 0.98.0 > -- > > Key: SPARK-1297 > URL: https://issues.apache.org/jira/browse/SPARK-1297 > Project: Spark > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: pom.xml, spark-1297-v2.txt, spark-1297-v4.txt, > spark-1297-v5.txt, spark-1297-v6.txt, spark-1297-v7.txt > > > HBase 0.94.6 was released 11 months ago. > Upgrade HBase dependency to 0.98.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
[ https://issues.apache.org/jira/browse/SPARK-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183016#comment-14183016 ] Ted Yu commented on SPARK-4066: --- bq. -Dscalastyle.failOnViolation was already a built-in way to control this See response from Koert: {noformat} i tried: mvn clean package -DskipTests -Dscalastyle.failOnViolation=false no luck, still get {noformat} bq. this has to be fixed anyway I agree that this needs to be done before patch submission. However, when formulating the patch, such check can be skipped. > Make whether maven builds fails on scalastyle violation configurable > > > Key: SPARK-4066 > URL: https://issues.apache.org/jira/browse/SPARK-4066 > Project: Spark > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > Attachments: spark-4066-v1.txt > > > Here is the thread Koert started: > http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit > It would be flexible if whether maven build fails due to scalastyle violation > configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-4066) Make whether maven builds fails on scalastyle violation configurable
Ted Yu created SPARK-4066: - Summary: Make whether maven builds fails on scalastyle violation configurable Key: SPARK-4066 URL: https://issues.apache.org/jira/browse/SPARK-4066 Project: Spark Issue Type: Improvement Reporter: Ted Yu Priority: Minor Here is the thread Koert started: http://search-hadoop.com/m/JW1q5j8z422/scalastyle+annoys+me+a+little+bit&subj=scalastyle+annoys+me+a+little+bit It would be flexible if whether maven build fails due to scalastyle violation configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org