[jira] [Created] (SPARK-11023) Error initializing SparkContext. java.net.URISyntaxException
Jose Antonio created SPARK-11023: Summary: Error initializing SparkContext. java.net.URISyntaxException Key: SPARK-11023 URL: https://issues.apache.org/jira/browse/SPARK-11023 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.5.1, 1.5.0 Environment: pyspark + windows Reporter: Jose Antonio Simliar to SPARK-10326. [https://issues.apache.org/jira/browse/SPARK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949470#comment-14949470] C:\WINDOWS\system32>pyspark --master yarn-client Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, Sep 15 2015, 14:26:14) [MSC v.1500 64 bit (AMD64)] Type "copyright", "credits" or "license" for more information. IPython 4.0.0 – An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. 15/10/08 09:28:05 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 15/10/08 09:28:06 WARN : Your hostname, PC-509512 resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:a5f:c318%net3, but we couldn't find any external IP address! 15/10/08 09:28:08 WARN BlockReaderLocal: The short-circuit local reads feature cannot be used because UNIX Domain sockets are not available on Windows. 15/10/08 09:28:08 ERROR SparkContext: Error initializing SparkContext. java.net.URISyntaxException: Illegal character in opaque part at index 2: C:\spark\bin\..\python\lib\pyspark.zip at java.net.URI$Parser.fail(Unknown Source) at java.net.URI$Parser.checkChars(Unknown Source) at java.net.URI$Parser.parse(Unknown Source) at java.net.URI.(Unknown Source) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:558) at org.apache.spark.deploy.yarn.Client$$anonfun$setupLaunchEnv$7.apply(Client.scala:557) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:557) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:628) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:523) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source) 15/10/08 09:28:08 ERROR Utils: Uncaught exception in thread Thread-2 java.lang.NullPointerException at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:152) at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1228) at org.apache.spark.SparkEnv.stop(SparkEnv.scala:100) at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1749) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185) at org.apache.spark.SparkContext.stop(SparkContext.scala:1748) at org.apache.spark.SparkContext.(SparkContext.scala:593) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Unknown Source) --- Py4JJavaError Traceback (most recent call last) C:\spark\bin\..\python\pyspark\shell.py in () 41
[jira] [Commented] (SPARK-8333) Spark failed to delete temp directory created by HiveContext
[ https://issues.apache.org/jira/browse/SPARK-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950075#comment-14950075 ] Dony.Xu commented on SPARK-8333: when i run the Streaming javaAPI test in windows 7, this issue also can be reproduced. java.io.IOException: Failed to delete: D:\workspace\spark\streaming\target\tmp\1444376717608-0 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) at org.apache.spark.util.Utils.deleteRecursively(Utils.scala) at org.apache.spark.streaming.JavaAPISuite.testCheckpointMasterRecovery(JavaAPISuite.java:1728) > Spark failed to delete temp directory created by HiveContext > > > Key: SPARK-8333 > URL: https://issues.apache.org/jira/browse/SPARK-8333 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.0 > Environment: Windows7 64bit >Reporter: sheng >Priority: Minor > Labels: Hive, metastore, sparksql > Attachments: test.tar > > > Spark 1.4.0 failed to stop SparkContext. > {code:title=LocalHiveTest.scala|borderStyle=solid} > val sc = new SparkContext("local", "local-hive-test", new SparkConf()) > val hc = Utils.createHiveContext(sc) > ... // execute some HiveQL statements > sc.stop() > {code} > sc.stop() failed to execute, it threw the following exception: > {quote} > 15/06/13 03:19:06 INFO Utils: Shutdown hook called > 15/06/13 03:19:06 INFO Utils: Deleting directory > C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea > 15/06/13 03:19:06 ERROR Utils: Exception while deleting Spark temp dir: > C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea > java.io.IOException: Failed to delete: > C:\Users\moshangcheng\AppData\Local\Temp\spark-d6d3c30e-512e-4693-a436-485e2af4baea > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:963) > at > org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:204) > at > org.apache.spark.util.Utils$$anonfun$1$$anonfun$apply$mcV$sp$5.apply(Utils.scala:201) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at org.apache.spark.util.Utils$$anonfun$1.apply$mcV$sp(Utils.scala:201) > at org.apache.spark.util.SparkShutdownHook.run(Utils.scala:2292) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(Utils.scala:2262) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(Utils.scala:2262) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(Utils.scala:2262) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(Utils.scala:2262) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(Utils.scala:2262) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$6.run(Utils.scala:2244) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {quote} > It seems this bug is introduced by this SPARK-6907. In SPARK-6907, a local > hive metastore is created in a temp directory. The problem is the local hive > metastore is not shut down correctly. At the end of application, if > SparkContext.stop() is called, it tries to delete the temp directory which is > still used by the local hive metastore, and throws an exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
Stavros Kontopoulos created SPARK-11025: --- Summary: Exception key can't be empty at getSystemProperties function in utils Key: SPARK-11025 URL: https://issues.apache.org/jira/browse/SPARK-11025 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.1, 1.4.1, 1.4.0, 1.3.1, 1.3.0 Reporter: Stavros Kontopoulos Priority: Trivial At file https://github.com/apache/spark/blob/v1.x.x/core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10679) javax.jdo.JDOFatalUserException in executor
[ https://issues.apache.org/jira/browse/SPARK-10679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10679: -- Assignee: Reynold Xin > javax.jdo.JDOFatalUserException in executor > --- > > Key: SPARK-10679 > URL: https://issues.apache.org/jira/browse/SPARK-10679 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Navis >Assignee: Reynold Xin >Priority: Minor > Fix For: 1.6.0 > > > HadoopRDD throws exception in executor, something like below. > {noformat} > 5/09/17 18:51:21 INFO metastore.HiveMetaStore: 0: Opening raw store with > implemenation class:org.apache.hadoop.hive.metastore.ObjectStore > 15/09/17 18:51:21 INFO metastore.ObjectStore: ObjectStore, initialize called > 15/09/17 18:51:21 WARN metastore.HiveMetaStore: Retrying creating default > database after error: Class > org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. > javax.jdo.JDOFatalUserException: Class > org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. > at > javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) > at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > at > org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803) > at > org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:298) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274) > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274) > at >
[jira] [Updated] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
[ https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11016: -- Component/s: Spark Core > Spark fails when running with a task that requires a more recent version of > RoaringBitmaps > -- > > Key: SPARK-11016 > URL: https://issues.apache.org/jira/browse/SPARK-11016 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Charles Allen > > The following error appears during Kryo init whenever a more recent version > (>0.5.0) of Roaring bitmaps is required by a job. > org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 > {code} > A needed class was not found. This could be due to an error in your runpath. > Missing class: org/roaringbitmap/RoaringArray$Element > java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) > {code} > See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11018) Support UDT in codegen and unsafe projection
[ https://issues.apache.org/jira/browse/SPARK-11018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11018: -- Component/s: SQL > Support UDT in codegen and unsafe projection > > > Key: SPARK-11018 > URL: https://issues.apache.org/jira/browse/SPARK-11018 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Blocker > > UDT is not handled correctly in codegen: > {code} > failed to compile: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 41, Column 30: No applicable constructor/method found > for actual parameters "int, java.lang.Object"; candidates are: "public void > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(int, > org.apache.spark.unsafe.types.CalendarInterval)", "public void > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(int, > org.apache.spark.sql.types.Decimal, int, int)", "public void > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(int, > org.apache.spark.unsafe.types.UTF8String)", "public void > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(int, > byte[])" > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11016) Spark fails when running with a task that requires a more recent version of RoaringBitmaps
[ https://issues.apache.org/jira/browse/SPARK-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950152#comment-14950152 ] Sean Owen commented on SPARK-11016: --- This is my ignorance, but is a proper serializer registered for roaringbitmaps classes in your app (or somehow by kryo by default)? Otherwise, relying on the default serialization may not work, indeed. This isn't a spark problem though. > Spark fails when running with a task that requires a more recent version of > RoaringBitmaps > -- > > Key: SPARK-11016 > URL: https://issues.apache.org/jira/browse/SPARK-11016 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Charles Allen > > The following error appears during Kryo init whenever a more recent version > (>0.5.0) of Roaring bitmaps is required by a job. > org/roaringbitmap/RoaringArray$Element was removed in 0.5.0 > {code} > A needed class was not found. This could be due to an error in your runpath. > Missing class: org/roaringbitmap/RoaringArray$Element > java.lang.NoClassDefFoundError: org/roaringbitmap/RoaringArray$Element > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala:338) > at > org.apache.spark.serializer.KryoSerializer$.(KryoSerializer.scala) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:93) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:237) > at > org.apache.spark.serializer.KryoSerializerInstance.(KryoSerializer.scala:222) > at > org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:138) > at > org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:201) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:102) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:85) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1318) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1006) > at > org.apache.spark.SparkContext$$anonfun$hadoopFile$1.apply(SparkContext.scala:1003) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1003) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:818) > at > org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:816) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at org.apache.spark.SparkContext.textFile(SparkContext.scala:816) > {code} > See https://issues.apache.org/jira/browse/SPARK-5949 for related info -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950157#comment-14950157 ] Stavros Kontopoulos edited comment on SPARK-11025 at 10/9/15 10:04 AM: --- falling back to previous impl: System.getProperties.clone().asInstanceOf[java.util.Properties].toMap[String, String] which was ignoring it, i guess at language level java does not complain so i think it is ok to ignore it...unless the general strategy is to catch everything that is wrong... but i think we should only validate what we use... i know -D only may come up as a mistake... just wanted to bring to the table what is the strategy and if for such minor mistakes should we fail the execution when spark config is created etc... was (Author: skonto): falling back to previous impl: System.getProperties.clone().asInstanceOf[java.util.Properties].toMap[String, String] which was ignoring it, i guess at language level java does not complain so i think it is ok to ignore it...unless the general strategy is to catch everything that is wrong... but i think we should only validate what we use... i know -D only may come up as a mistake... just wanted to bring to the table what is the strategy and if for such minor mistakes should we fail the execution when spark context is created etc... > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored or just passed them without filtering at that > level as in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10944) Provide self contained deployment not tighly coupled with Hadoop
[ https://issues.apache.org/jira/browse/SPARK-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranas Baliuka updated SPARK-10944: --- Flags: Patch (was: Patch,Important) Labels: patch (was: easyfix patch) Remaining Estimate: (was: 2h) Original Estimate: (was: 2h) Priority: Minor (was: Major) Description: Attempt to run Spark cluster on Mac OS machine fails if Hadoop is not installed. There should be no real need to install full blown Hadoop installation just to run Spark. Current situation {code} # cd $SPARK_HOME Imin:spark-1.5.1-bin-without-hadoop pranas$ ./sbin/start-master.sh {code} Output: {code} starting org.apache.spark.deploy.master.Master, logging to /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out failed to launch org.apache.spark.deploy.master.Master: at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more full log in /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out {code} Log: {code} # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -cp /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../conf/:/Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.2.0.jar -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip Imin.local --port 7077 --webui-port 8080 Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) {code} Proposed short term fix: Bundle all required 3rd party libs to the uberjar and/or fix start-up script to include required 3rd party libs. Long term quality improvement proposal: Introduce integration tests to check distribution before releasing. was: Attempt to run Spark cluster on Mac OS machine fails Invocation: {code} # cd $SPARK_HOME Imin:spark-1.5.1-bin-without-hadoop pranas$ ./sbin/start-master.sh {code} Output: {code} starting org.apache.spark.deploy.master.Master, logging to /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out failed to launch org.apache.spark.deploy.master.Master: at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more full log in /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out {code} Log: {code} # Options read when launching programs locally with # ./bin/run-example or ./bin/spark-submit Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -cp /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../conf/:/Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.2.0.jar -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip Imin.local --port 7077 --webui-port 8080 Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) {code} Proposed short term fix:
[jira] [Reopened] (SPARK-10944) Provide self contained deployment not tighly coupled with Hadoop
[ https://issues.apache.org/jira/browse/SPARK-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranas Baliuka reopened SPARK-10944: Updated as features request. > Provide self contained deployment not tighly coupled with Hadoop > > > Key: SPARK-10944 > URL: https://issues.apache.org/jira/browse/SPARK-10944 > Project: Spark > Issue Type: New Feature > Components: Deploy >Affects Versions: 1.5.1 > Environment: Mac OS/Java 8/Spark 1.5.1 without hadoop >Reporter: Pranas Baliuka >Priority: Minor > Labels: patch > > Attempt to run Spark cluster on Mac OS machine fails if Hadoop is not > installed. There should be no real need to install full blown Hadoop > installation just to run Spark. > Current situation > {code} > # cd $SPARK_HOME > Imin:spark-1.5.1-bin-without-hadoop pranas$ ./sbin/start-master.sh > {code} > Output: > {code} > starting org.apache.spark.deploy.master.Master, logging to > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > failed to launch org.apache.spark.deploy.master.Master: > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more > full log in > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > {code} > Log: > {code} > # Options read when launching programs locally with > # ./bin/run-example or ./bin/spark-submit > Spark Command: > /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -cp > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../conf/:/Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.2.0.jar > -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip Imin.local --port > 7077 --webui-port 8080 > > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at > sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at > sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > {code} > Proposed short term fix: > Bundle all required 3rd party libs to the uberjar and/or fix start-up script > to include required 3rd party libs. > Long term quality improvement proposal: Introduce integration tests to check > distribution before releasing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11022) Spark Worker process find Memory leaking after long time running
[ https://issues.apache.org/jira/browse/SPARK-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11022: -- Priority: Minor (was: Major) Can you update the title to be more clear about the cause and resolution? you are specifically suggesting that the list of executors needs to be garbage collected. (Do you really have 17K executors, most of which are dead, in one app?) > Spark Worker process find Memory leaking after long time running > > > Key: SPARK-11022 > URL: https://issues.apache.org/jira/browse/SPARK-11022 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: colin shaw >Priority: Minor > > Worker process often down,while there were not any abnormal tasks,just crash > without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 > instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by > "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) > bytes. " > and almost all the instance were stored in a > "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field > hold many ExecutorRunner. > The codes(Worker.scala) shows finishedExecutors just > "finishedExecutors(fullId) = executor" and > "finishedExecutors.values.toList",there is no action which remove the > Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11024) Optimize NULL in by folding it to Literal(null)
Dilip Biswal created SPARK-11024: Summary: Optimize NULL in by folding it to Literal(null) Key: SPARK-11024 URL: https://issues.apache.org/jira/browse/SPARK-11024 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Reporter: Dilip Biswal Priority: Minor Add a rule in optimizer to convert NULL [NOT] IN (expr1,...,expr2) to Literal(null). This is a follow up defect to SPARK-8654 and suggested by Wenchen Fan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11024) Optimize NULL in by folding it to Literal(null)
[ https://issues.apache.org/jira/browse/SPARK-11024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950124#comment-14950124 ] Dilip Biswal commented on SPARK-11024: -- I am currently working on a PR for this issue. > Optimize NULL in by folding it to Literal(null) > > > Key: SPARK-11024 > URL: https://issues.apache.org/jira/browse/SPARK-11024 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 >Reporter: Dilip Biswal >Priority: Minor > > Add a rule in optimizer to convert NULL [NOT] IN (expr1,...,expr2) to > Literal(null). > This is a follow up defect to SPARK-8654 and suggested by Wenchen Fan. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty was: At file ../core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored i think. was: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored i think. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file ../core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty was: At file https://github.com/apache/spark/blob/v1.x.x/core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file ../core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored or just passed them without filtering at that level as in previous versions. was: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored at that level as in previous versions. > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored or just passed them without filtering at that > level as in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast
[ https://issues.apache.org/jira/browse/SPARK-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8654: - Assignee: Dilip Biswal > Analysis exception when using "NULL IN (...)": invalid cast > --- > > Key: SPARK-8654 > URL: https://issues.apache.org/jira/browse/SPARK-8654 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Santiago M. Mola >Assignee: Dilip Biswal >Priority: Minor > > The following query throws an analysis exception: > {code} > SELECT * FROM t WHERE NULL NOT IN (1, 2, 3); > {code} > The exception is: > {code} > org.apache.spark.sql.AnalysisException: invalid cast from int to null; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:66) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52) > {code} > Here is a test that can be added to AnalysisSuite to check the issue: > {code} > test("SPARK- regression test") { > val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), > "a")() :: Nil, > LocalRelation() > ) > caseInsensitiveAnalyze(plan) > } > {code} > Note that this kind of query is a corner case, but it is still valid SQL. An > expression such as "NULL IN (...)" or "NULL NOT IN (...)" always gives NULL > as a result, even if the list contains NULL. So it is safe to translate these > expressions to Literal(null) during analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11014) RPC Time Out Exceptions
[ https://issues.apache.org/jira/browse/SPARK-11014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11014: -- Component/s: YARN > RPC Time Out Exceptions > --- > > Key: SPARK-11014 > URL: https://issues.apache.org/jira/browse/SPARK-11014 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 > Environment: YARN >Reporter: Gurpreet Singh > > I am seeing lots of the following RPC exception messages in YARN logs: > > 15/10/08 13:04:27 WARN executor.Executor: Issue communicating with driver in > heartbeater > org.apache.spark.SparkException: Error sending message [message = > Heartbeat(437,[Lscala.Tuple2;@34199eb1,BlockManagerId(437, > phxaishdc9dn1294.stratus.phx.ebay.com, 47480))] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:452) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:472) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:472) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:472) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.rpc.RpcTimeoutException: Futures timed out after > [120 seconds]. This timeout is controlled by spark.rpc.askTimeout > at > org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 14 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [120 seconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:107) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:241) > ... 15 more > ## -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10973) __gettitem__ method throws IndexError exception when we try to access index after the last non-zero entry.
[ https://issues.apache.org/jira/browse/SPARK-10973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10973: -- Labels: backport-needed (was: ) > __gettitem__ method throws IndexError exception when we try to access index > after the last non-zero entry. > -- > > Key: SPARK-10973 > URL: https://issues.apache.org/jira/browse/SPARK-10973 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.6.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz > Labels: backport-needed > Fix For: 1.6.0 > > > \_\_gettitem\_\_ method throws IndexError exception when we try to access > index after the last non-zero entry. > {code} > from pyspark.mllib.linalg import Vectors > sv = Vectors.sparse(5, {1: 3}) > sv[0] > ## 0.0 > sv[1] > ## 3.0 > sv[2] > ## Traceback (most recent call last): > ## File "", line 1, in > ## File "/python/pyspark/mllib/linalg/__init__.py", line 734, in __getitem__ > ## row_ind = inds[insert_index] > ## IndexError: index out of bounds > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950157#comment-14950157 ] Stavros Kontopoulos commented on SPARK-11025: - falling back to previous impl: System.getProperties.clone().asInstanceOf[java.util.Properties].toMap[String, String] which was ignoring it, i guess at language level java does not complain so i think it is ok to ignore it...unless the general strategy is to catch everything that is wrong... but i think we should only validate what we use... i know -D only may come up as a mistake... just wanted to bring to the table what is the strategy and if for such minor mistakes should we fail the execution when spark context is created etc... > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored or just passed them without filtering at that > level as in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10326) Cannot launch YARN job on Windows
[ https://issues.apache.org/jira/browse/SPARK-10326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950065#comment-14950065 ] Jose Antonio commented on SPARK-10326: -- Bug reported. Thanks, Jose -- / .- .-.. .-.. / -.-- --- ..- / -. . . -.. / .. ... / .-.. --- ...- . José Antonio Martín H. (PhD) E-Mail: jamart...@fdi.ucm.es Computer Science Faculty Phone: (+34) 91 3947650 Complutense University of Madrid Fax: (+34) 91 3947527 C/ Prof. José García Santesmases,s/n 28040 Madrid, Spain web: http://www.dacya.ucm.es/jam/ LinkedIn: http://www.linkedin.com/in/jamartinh (Let's connect) .-.. --- ...- . / .. ... / .- .-.. .-.. / .-- . / -. . . -.. > Cannot launch YARN job on Windows > -- > > Key: SPARK-10326 > URL: https://issues.apache.org/jira/browse/SPARK-10326 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 1.5.0 > > > The fix is already in master, and it's one line out of the patch for > SPARK-5754; the bug is that a Windows file path cannot be used to create a > URI, to {{File.toURI()}} needs to be called. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10944) Provide self contained deployment not tighly coupled with Hadoop
[ https://issues.apache.org/jira/browse/SPARK-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-10944. - > Provide self contained deployment not tighly coupled with Hadoop > > > Key: SPARK-10944 > URL: https://issues.apache.org/jira/browse/SPARK-10944 > Project: Spark > Issue Type: New Feature > Components: Deploy >Affects Versions: 1.5.1 > Environment: Mac OS/Java 8/Spark 1.5.1 without hadoop >Reporter: Pranas Baliuka >Priority: Minor > Labels: patch > > Attempt to run Spark cluster on Mac OS machine fails if Hadoop is not > installed. There should be no real need to install full blown Hadoop > installation just to run Spark. > Current situation > {code} > # cd $SPARK_HOME > Imin:spark-1.5.1-bin-without-hadoop pranas$ ./sbin/start-master.sh > {code} > Output: > {code} > starting org.apache.spark.deploy.master.Master, logging to > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > failed to launch org.apache.spark.deploy.master.Master: > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more > full log in > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > {code} > Log: > {code} > # Options read when launching programs locally with > # ./bin/run-example or ./bin/spark-submit > Spark Command: > /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -cp > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../conf/:/Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.2.0.jar > -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip Imin.local --port > 7077 --webui-port 8080 > > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at > sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at > sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > {code} > Proposed short term fix: > Bundle all required 3rd party libs to the uberjar and/or fix start-up script > to include required 3rd party libs. > Long term quality improvement proposal: Introduce integration tests to check > distribution before releasing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10945) GraphX computes Pagerank with NaN (with some datasets)
[ https://issues.apache.org/jira/browse/SPARK-10945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950095#comment-14950095 ] Khaled Ammar commented on SPARK-10945: -- Hi [~ankurd], I wonder if you had a chance to work on this issue. Thanks, -Khaled > GraphX computes Pagerank with NaN (with some datasets) > -- > > Key: SPARK-10945 > URL: https://issues.apache.org/jira/browse/SPARK-10945 > Project: Spark > Issue Type: Bug > Components: GraphX >Affects Versions: 1.3.0 > Environment: Linux >Reporter: Khaled Ammar > Labels: test > > Hi, > I run GraphX in a medium size standalone Spark 1.3.0 installation. The > pagerank typically works fine, except with one dataset (Twitter: > http://law.di.unimi.it/webdata/twitter-2010). This is a public dataset that > is commonly used in research papers. > I found that many vertices have an NaN values. This is true, even if the > algorithm run for 1 iteration only. > Thanks, > -Khaled -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10944) Provide self contained deployment not tighly coupled with Hadoop
[ https://issues.apache.org/jira/browse/SPARK-10944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10944. --- Resolution: Not A Problem [~pranas] please don't reopen an issue unless there is a clear change in the reason that is was closed. Here, Marcelo explained the problem: you're using an artifact that requires you to provide Hadoop classes, but you are not. You should not use this artifact. In fact, Spark does require Hadoop *classes* no matter what. > Provide self contained deployment not tighly coupled with Hadoop > > > Key: SPARK-10944 > URL: https://issues.apache.org/jira/browse/SPARK-10944 > Project: Spark > Issue Type: New Feature > Components: Deploy >Affects Versions: 1.5.1 > Environment: Mac OS/Java 8/Spark 1.5.1 without hadoop >Reporter: Pranas Baliuka >Priority: Minor > Labels: patch > > Attempt to run Spark cluster on Mac OS machine fails if Hadoop is not > installed. There should be no real need to install full blown Hadoop > installation just to run Spark. > Current situation > {code} > # cd $SPARK_HOME > Imin:spark-1.5.1-bin-without-hadoop pranas$ ./sbin/start-master.sh > {code} > Output: > {code} > starting org.apache.spark.deploy.master.Master, logging to > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > failed to launch org.apache.spark.deploy.master.Master: > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more > full log in > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../logs/spark-pranas-org.apache.spark.deploy.master.Master-1-Imin.local.out > {code} > Log: > {code} > # Options read when launching programs locally with > # ./bin/run-example or ./bin/spark-submit > Spark Command: > /Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home/bin/java -cp > /Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/sbin/../conf/:/Users/pranas/Apps/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.2.0.jar > -Xms1g -Xmx1g org.apache.spark.deploy.master.Master --ip Imin.local --port > 7077 --webui-port 8080 > > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at > sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at > sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > {code} > Proposed short term fix: > Bundle all required 3rd party libs to the uberjar and/or fix start-up script > to include required 3rd party libs. > Long term quality improvement proposal: Introduce integration tests to check > distribution before releasing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored at that level a sin previous versions. was: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored i think. > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored at that level a sin previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950145#comment-14950145 ] Sean Owen commented on SPARK-11025: --- What behavior do you suggest - ignoring it? Clearly {{-D}} by itself is a mistake though. It should cause an error that you notice. > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored at that level as in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11025) Exception key can't be empty at getSystemProperties function in utils
[ https://issues.apache.org/jira/browse/SPARK-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stavros Kontopoulos updated SPARK-11025: Description: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored at that level as in previous versions. was: At file core/src/main/scala/org/apache/spark/util/Utils.scala getSystemProperties function fails when someone passes -D to the jvm and as a result the key passed is "" (empty). Exception thrown: java.lang.IllegalArgumentException: key can't be empty Empty keys should be ignored at that level a sin previous versions. > Exception key can't be empty at getSystemProperties function in utils > -- > > Key: SPARK-11025 > URL: https://issues.apache.org/jira/browse/SPARK-11025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.1 >Reporter: Stavros Kontopoulos >Priority: Trivial > Labels: easyfix, easytest > > At file core/src/main/scala/org/apache/spark/util/Utils.scala > getSystemProperties function fails when someone passes -D to the jvm and as a > result the key passed is "" (empty). > Exception thrown: java.lang.IllegalArgumentException: key can't be empty > Empty keys should be ignored at that level as in previous versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11006) Rename NullColumnAccess as NullColumnAccessor
[ https://issues.apache.org/jira/browse/SPARK-11006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11006: -- Assignee: Ted Yu > Rename NullColumnAccess as NullColumnAccessor > - > > Key: SPARK-11006 > URL: https://issues.apache.org/jira/browse/SPARK-11006 > Project: Spark > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Trivial > Fix For: 1.6.0 > > > In sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala > , NullColumnAccess should be renmaed as NullColumnAccessor so that same > convention is adhered to for the accessors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10902) Hive UDF current_database() does not work
[ https://issues.apache.org/jira/browse/SPARK-10902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10902: -- Assignee: Davies Liu > Hive UDF current_database() does not work > - > > Key: SPARK-10902 > URL: https://issues.apache.org/jira/browse/SPARK-10902 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu >Priority: Critical > Fix For: 1.6.0 > > > Hive UDF current_database() is foldable, it need to access the SessionState > in metadataHive to evaluate it, but this not accessible while optimizing the > query plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11004) MapReduce Hive-like join operations for RDDs
[ https://issues.apache.org/jira/browse/SPARK-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-11004: -- Component/s: Shuffle > MapReduce Hive-like join operations for RDDs > > > Key: SPARK-11004 > URL: https://issues.apache.org/jira/browse/SPARK-11004 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: Glenn Strycker > > Could a feature be added to Spark that would use disk-only MapReduce > operations for the very largest RDD joins? > MapReduce is able to handle incredibly large table joins in a stable, > predictable way with gracious failures and recovery. I have applications > that are able to join 2 tables without error in Hive, but these same tables, > when converted into RDDs, are unable to join in Spark (I am using the same > cluster, and have played around with all of the memory configurations, > persisting to disk, checkpointing, etc., and the RDDs are just too big for > Spark on my cluster) > So, Spark is usually able to handle fairly large RDD joins, but occasionally > runs into problems when the tables are just too big (e.g. the notorious 2GB > shuffle limit issue, memory problems, etc.) There are so many parameters to > adjust (number of partitions, number of cores, memory per core, etc.) that it > is difficult to guarantee stability on a shared cluster (say, running Yarn) > with other jobs. > Could a feature be added to Spark that would use disk-only MapReduce commands > to do very large joins? > That is, instead of myRDD1.join(myRDD2), we would have a special operation > myRDD1.mapReduceJoin(myRDD2) that would checkpoint both RDDs to disk, run > MapReduce, and then convert the results of the join back into a standard RDD. > This might add stability for Spark jobs that deal with extremely large data, > and enable developers to mix-and-match some Spark and MapReduce operations in > the same program, rather than writing Hive scripts and stringing together > Spark and MapReduce programs, which has extremely large overhead to convert > RDDs to Hive tables and back again. > Despite memory-level operations being where most of Spark's speed gains lie, > sometimes using disk-only may help with stability! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast
[ https://issues.apache.org/jira/browse/SPARK-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-8654: - Fix Version/s: (was: 1.6.0) > Analysis exception when using "NULL IN (...)": invalid cast > --- > > Key: SPARK-8654 > URL: https://issues.apache.org/jira/browse/SPARK-8654 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Santiago M. Mola >Priority: Minor > > The following query throws an analysis exception: > {code} > SELECT * FROM t WHERE NULL NOT IN (1, 2, 3); > {code} > The exception is: > {code} > org.apache.spark.sql.AnalysisException: invalid cast from int to null; > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:66) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52) > {code} > Here is a test that can be added to AnalysisSuite to check the issue: > {code} > test("SPARK- regression test") { > val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), > "a")() :: Nil, > LocalRelation() > ) > caseInsensitiveAnalyze(plan) > } > {code} > Note that this kind of query is a corner case, but it is still valid SQL. An > expression such as "NULL IN (...)" or "NULL NOT IN (...)" always gives NULL > as a result, even if the list contains NULL. So it is safe to translate these > expressions to Literal(null) during analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950156#comment-14950156 ] Glyton Camilleri commented on SPARK-6847: - Hi, I've also bumped into this very same issue but couldn't find a good value for {{checkpoint}}; our setup consists of a kafka-stream with 10s time-window, trying various values for the checkpoint interval (default, 10s, and 15s). It always takes a long time for the exception to appear, often in the range of 10 hours or so, making the problem relatively painful to debug. We'll be trying to investigate further, but it would be great if someone could shed some more light on the issue. > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7751) Add @Since annotation to stable and experimental methods in MLlib
[ https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950169#comment-14950169 ] Alex Hu commented on SPARK-7751: This is late as the epic is almost complete but an alternative of determining a string's provenance would be to run the following command. {code} git log -S{string} {filePath} {code} After determining the relevant commit, you can determine the tag with {code} git tag --contains {commit} {code} > Add @Since annotation to stable and experimental methods in MLlib > - > > Key: SPARK-7751 > URL: https://issues.apache.org/jira/browse/SPARK-7751 > Project: Spark > Issue Type: Umbrella > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > Labels: starter > > This is useful to check whether a feature exists in some version of Spark. > This is an umbrella JIRA to track the progress. We want to have -@since tag- > @Since annotation for both stable (those without any > Experimental/DeveloperApi/AlphaComponent annotations) and experimental > methods in MLlib: > (Do NOT tag private or package private classes or methods, nor local > variables and methods.) > * an example PR for Scala: https://github.com/apache/spark/pull/8309 > We need to dig the history of git commit to figure out what was the Spark > version when a method was first introduced. Take `NaiveBayes.setModelType` as > an example. We can grep `def setModelType` at different version git tags. > {code} > meng@xm:~/src/spark > $ git show > v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > meng@xm:~/src/spark > $ git show > v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > def setModelType(modelType: String): NaiveBayes = { > {code} > If there are better ways, please let us know. > We cannot add all -@since tags- @Since annotation in a single PR, which is > hard to review. So we made some subtasks for each package, for example > `org.apache.spark.classification`. Feel free to add more sub-tasks for Python > and the `spark.ml` package. > Plan: > 1. In 1.5, we try to add @Since annotation to all stable/experimental methods > under `spark.mllib`. > 2. Starting from 1.6, we require @Since annotation in all new PRs. > 3. In 1.6, we try to add @SInce annotation to all stable/experimental methods > under `spark.ml`, `pyspark.mllib`, and `pyspark.ml`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-11041) Add (NOT) IN / EXISTS support for predicates
[ https://issues.apache.org/jira/browse/SPARK-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao closed SPARK-11041. - Resolution: Duplicate > Add (NOT) IN / EXISTS support for predicates > > > Key: SPARK-11041 > URL: https://issues.apache.org/jira/browse/SPARK-11041 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Cheng Hao > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11043: Assignee: (was: Apache Spark) > Hive Thrift Server will log warn "Couldn't find log associated with operation > handle" > - > > Key: SPARK-11043 > URL: https://issues.apache.org/jira/browse/SPARK-11043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: SaintBacchus > > The warnning log is below: > {code:title=Warnning Log|borderStyle=solid} > 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: > org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated > with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Once I execute a statement, there will have this warnning log by the default > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11043: Assignee: Apache Spark > Hive Thrift Server will log warn "Couldn't find log associated with operation > handle" > - > > Key: SPARK-11043 > URL: https://issues.apache.org/jira/browse/SPARK-11043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: Apache Spark > > The warnning log is below: > {code:title=Warnning Log|borderStyle=solid} > 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: > org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated > with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Once I execute a statement, there will have this warnning log by the default > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10306) sbt hive/update issue
[ https://issues.apache.org/jira/browse/SPARK-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951371#comment-14951371 ] holdenk commented on SPARK-10306: - So the pull request that I posted has a solution that works for me, but I've avoided up-streaming it since it the other spark developers were not experiencing the issue. Could other people who experience this run "hive/evicted" & hive/dependencyTree and post the results here? > sbt hive/update issue > - > > Key: SPARK-10306 > URL: https://issues.apache.org/jira/browse/SPARK-10306 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: holdenk >Priority: Trivial > > Running sbt hive/update sometimes results in the error "impossible to get > artifacts when data has not been loaded. IvyNode = > org.scala-lang#scala-library;2.10.3" which is unfortunate since it is always > evicted by 2.10.4 currently. An easy (but maybe not super clean) solution > would be adding 2.10.3 as a dependency which will then get evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-10306) sbt hive/update issue
[ https://issues.apache.org/jira/browse/SPARK-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reopened SPARK-10306: - re-opened since other users are also experiencing the issue > sbt hive/update issue > - > > Key: SPARK-10306 > URL: https://issues.apache.org/jira/browse/SPARK-10306 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: holdenk >Priority: Trivial > > Running sbt hive/update sometimes results in the error "impossible to get > artifacts when data has not been loaded. IvyNode = > org.scala-lang#scala-library;2.10.3" which is unfortunate since it is always > evicted by 2.10.4 currently. An easy (but maybe not super clean) solution > would be adding 2.10.3 as a dependency which will then get evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2309) Generalize the binary logistic regression into multinomial logistic regression
[ https://issues.apache.org/jira/browse/SPARK-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951438#comment-14951438 ] DB Tsai commented on SPARK-2309: I don't quite get you, can you elaborate? But I'm pretty sure that the implementation in Spark MLlib is the same as slide and that's standard multinomial LoR. You can check the test code which shows that the result matches R. > Generalize the binary logistic regression into multinomial logistic regression > -- > > Key: SPARK-2309 > URL: https://issues.apache.org/jira/browse/SPARK-2309 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: DB Tsai >Assignee: DB Tsai >Priority: Critical > Fix For: 1.3.0 > > > Currently, there is no multi-class classifier in mllib. Logistic regression > can be extended to multinomial one straightforwardly. > The following formula will be implemented. > http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/25 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"
SaintBacchus created SPARK-11043: Summary: Hive Thrift Server will log warn "Couldn't find log associated with operation handle" Key: SPARK-11043 URL: https://issues.apache.org/jira/browse/SPARK-11043 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: SaintBacchus The warnning log is below: {code:title=Warnning Log|borderStyle=solid} 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] at org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951526#comment-14951526 ] Apache Spark commented on SPARK-11043: -- User 'SaintBacchus' has created a pull request for this issue: https://github.com/apache/spark/pull/9056 > Hive Thrift Server will log warn "Couldn't find log associated with operation > handle" > - > > Key: SPARK-11043 > URL: https://issues.apache.org/jira/browse/SPARK-11043 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: SaintBacchus > > The warnning log is below: > {code:title=Warnning Log|borderStyle=solid} > 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: > org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated > with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] > at > org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > Once I execute a statement, there will have this warnning log by the default > configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11013) SparkPlan may mistakenly register child plan's accumulators for SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951534#comment-14951534 ] Wenchen Fan commented on SPARK-11013: - The problem is we report accumulators that should not be reported. For example, a query plan "Aggregate -> Exchange -> Aggregate", we defined 2 metrics for `Aggregate`: `numInputRows` and `numOutputRows`. This query has 2 stages(let's say stg1 and stg2) that are splitted by Exchange. When we run stg1, we should report 2 accumulators for the bottom Aggregate. When we run stg2, we should report another 2 accumulators for the top Aggregate. However, when we run stg2, we report 4 accumulators, and 2 of them is for the bottom Aggregate which is introduced by the serialization problem described before, and never get updated. Then the bottom Aggregate's metrics has an extra zero-value updating and may lead to wrong result for future metrics like min. > SparkPlan may mistakenly register child plan's accumulators for SQL metrics > --- > > Key: SPARK-11013 > URL: https://issues.apache.org/jira/browse/SPARK-11013 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan > > The reason is that: when we call RDD API inside SparkPlan, we are very likely > to reference the SparkPlan in the closure and thus serialize and transfer a > SparkPlan tree to executor side. When we deserialize it, the accumulators in > child SparkPlan are also deserialized and registered, and always report zero > value. > This is not a problem currently because we only have one operation to > aggregate the accumulators: add. However, if we wanna support more complex > metric like min, the extra zero values will lead to wrong result. > Take TungstenAggregate as an example, I logged "stageId, partitionId, > accumName, accumId" when an accumulator is deserialized and registered, and > logged the "accumId -> accumValue" map when a task ends. The output is: > {code} > scala> val df = Seq(1 -> "a", 2 -> "b").toDF("a", "b").groupBy().count() > df: org.apache.spark.sql.DataFrame = [count: bigint] > scala> df.collect > register: 0 0 Some(number of input rows) 4 > register: 0 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 4 > register: 1 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 2 > register: 1 0 Some(number of output rows) 3 > Map(5 -> 1, 4 -> 2, 6 -> 4458496) > Map(5 -> 0, 2 -> 1, 7 -> 4458496, 3 -> 1, 4 -> 0) > res0: Array[org.apache.spark.sql.Row] = Array([2]) > {code} > The best choice is to avoid serialize and deserialize a SparkPlan tree, which > can be achieved by LocalNode. > Or we can do some workaround to fix this serialization problem for the > problematic SparkPlans like TungstenAggregate, TungstenSort. > Or we can improve the SQL metrics framework to make it more robust to this > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey
[ https://issues.apache.org/jira/browse/SPARK-6567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951540#comment-14951540 ] Ashish Gupta commented on SPARK-6567: - did this effort succeed? > Large linear model parallelism via a join and reduceByKey > - > > Key: SPARK-6567 > URL: https://issues.apache.org/jira/browse/SPARK-6567 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Reza Zadeh > Attachments: model-parallelism.pptx > > > To train a linear model, each training point in the training set needs its > dot product computed against the model, per iteration. If the model is large > (too large to fit in memory on a single machine) then SPARK-4590 proposes > using parameter server. > There is an easier way to achieve this without parameter servers. In > particular, if the data is held as a BlockMatrix and the model as an RDD, > then each block can be joined with the relevant part of the model, followed > by a reduceByKey to compute the dot products. > This obviates the need for a parameter server, at least for linear models. > However, it's unclear how it compares performance-wise to parameter servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11040) SaslRpcHandler does not delegate all methods to underlying handler
[ https://issues.apache.org/jira/browse/SPARK-11040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11040: Assignee: Apache Spark > SaslRpcHandler does not delegate all methods to underlying handler > -- > > Key: SPARK-11040 > URL: https://issues.apache.org/jira/browse/SPARK-11040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark > > {{SaslRpcHandler}} only delegates {{receive}} and {{getStreamManager}}, so > when SASL is enabled, other events will be missed by apps. > This affects other version too, but I think these events aren't actually used > there. They'll be used by the new rpc backend in 1.6, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951419#comment-14951419 ] Jakob Odersky commented on SPARK-10876: --- I'm not sure what you mean. The UI already has a "Duration" field for every job. > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Odersky updated SPARK-10876: -- Comment: was deleted (was: I'm not sure what you mean. The UI already has a "Duration" field for every job.) > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11041) Add (NOT) IN / EXISTS support for predicates
Cheng Hao created SPARK-11041: - Summary: Add (NOT) IN / EXISTS support for predicates Key: SPARK-11041 URL: https://issues.apache.org/jira/browse/SPARK-11041 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11043) Hive Thrift Server will log warn "Couldn't find log associated with operation handle"
[ https://issues.apache.org/jira/browse/SPARK-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SaintBacchus updated SPARK-11043: - Description: The warnning log is below: {code:title=Warnning Log|borderStyle=solid} 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] at org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} Once I execute a statement, there will have this warnning log by the default configuration. was: The warnning log is below: {code:title=Warnning Log|borderStyle=solid} 15/10/09 16:48:23 WARN thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: Couldn't find log associated with operation handle: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=fb0900c7-6244-432e-a779-b449ca7f7ca0] at org.apache.hive.service.cli.operation.OperationManager.getOperationLogRowSet(OperationManager.java:229) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy32.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at
[jira] [Assigned] (SPARK-11040) SaslRpcHandler does not delegate all methods to underlying handler
[ https://issues.apache.org/jira/browse/SPARK-11040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11040: Assignee: (was: Apache Spark) > SaslRpcHandler does not delegate all methods to underlying handler > -- > > Key: SPARK-11040 > URL: https://issues.apache.org/jira/browse/SPARK-11040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin > > {{SaslRpcHandler}} only delegates {{receive}} and {{getStreamManager}}, so > when SASL is enabled, other events will be missed by apps. > This affects other version too, but I think these events aren't actually used > there. They'll be used by the new rpc backend in 1.6, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11040) SaslRpcHandler does not delegate all methods to underlying handler
[ https://issues.apache.org/jira/browse/SPARK-11040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951405#comment-14951405 ] Apache Spark commented on SPARK-11040: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/9053 > SaslRpcHandler does not delegate all methods to underlying handler > -- > > Key: SPARK-11040 > URL: https://issues.apache.org/jira/browse/SPARK-11040 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Marcelo Vanzin > > {{SaslRpcHandler}} only delegates {{receive}} and {{getStreamManager}}, so > when SASL is enabled, other events will be missed by apps. > This affects other version too, but I think these events aren't actually used > there. They'll be used by the new rpc backend in 1.6, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager
[ https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10985: -- Assignee: Bowen Zhang > Avoid passing evicted blocks throughout BlockManager / CacheManager > --- > > Key: SPARK-10985 > URL: https://issues.apache.org/jira/browse/SPARK-10985 > Project: Spark > Issue Type: Sub-task > Components: Block Manager, Spark Core >Reporter: Andrew Or >Assignee: Bowen Zhang >Priority: Minor > > This is a minor refactoring task. > Currently when we attempt to put a block in, we get back an array buffer of > blocks that are dropped in the process. We do this to propagate these blocks > back to our TaskContext, which will add them to its TaskMetrics so we can see > them in the SparkUI storage tab properly. > Now that we have TaskContext.get, we can just use that to propagate this > information. This simplifies a lot of the signatures and gets rid of weird > return types like the following everywhere: > {code} > ArrayBuffer[(BlockId, BlockStatus)] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10876) display total application time in spark history UI
[ https://issues.apache.org/jira/browse/SPARK-10876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951434#comment-14951434 ] Jakob Odersky commented on SPARK-10876: --- Do you mean to display the total run time of uncompleted apps? Completed apps already have a "Duration" field > display total application time in spark history UI > -- > > Key: SPARK-10876 > URL: https://issues.apache.org/jira/browse/SPARK-10876 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The history file has an application start and application end events. It > would be nice if we could use these to display the total run time for the > application in the history UI. > Could be displayed similar to "Total Uptime" for a running application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4226) SparkSQL - Add support for subqueries in predicates
[ https://issues.apache.org/jira/browse/SPARK-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951468#comment-14951468 ] Apache Spark commented on SPARK-4226: - User 'chenghao-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/9055 > SparkSQL - Add support for subqueries in predicates > --- > > Key: SPARK-4226 > URL: https://issues.apache.org/jira/browse/SPARK-4226 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.2.0 > Environment: Spark 1.2 snapshot >Reporter: Terry Siu > > I have a test table defined in Hive as follows: > {code:sql} > CREATE TABLE sparkbug ( > id INT, > event STRING > ) STORED AS PARQUET; > {code} > and insert some sample data with ids 1, 2, 3. > In a Spark shell, I then create a HiveContext and then execute the following > HQL to test out subquery predicates: > {code} > val hc = HiveContext(hc) > hc.hql("select customerid from sparkbug where customerid in (select > customerid from sparkbug where customerid in (2,3))") > {code} > I get the following error: > {noformat} > java.lang.RuntimeException: Unsupported language features in query: select > customerid from sparkbug where customerid in (select customerid from sparkbug > where customerid in (2,3)) > TOK_QUERY > TOK_FROM > TOK_TABREF > TOK_TABNAME > sparkbug > TOK_INSERT > TOK_DESTINATION > TOK_DIR > TOK_TMP_FILE > TOK_SELECT > TOK_SELEXPR > TOK_TABLE_OR_COL > customerid > TOK_WHERE > TOK_SUBQUERY_EXPR > TOK_SUBQUERY_OP > in > TOK_QUERY > TOK_FROM > TOK_TABREF > TOK_TABNAME > sparkbug > TOK_INSERT > TOK_DESTINATION > TOK_DIR > TOK_TMP_FILE > TOK_SELECT > TOK_SELEXPR > TOK_TABLE_OR_COL > customerid > TOK_WHERE > TOK_FUNCTION > in > TOK_TABLE_OR_COL > customerid > 2 > 3 > TOK_TABLE_OR_COL > customerid > scala.NotImplementedError: No parse rules for ASTNode type: 817, text: > TOK_SUBQUERY_EXPR : > TOK_SUBQUERY_EXPR > TOK_SUBQUERY_OP > in > TOK_QUERY > TOK_FROM > TOK_TABREF > TOK_TABNAME > sparkbug > TOK_INSERT > TOK_DESTINATION > TOK_DIR > TOK_TMP_FILE > TOK_SELECT > TOK_SELEXPR > TOK_TABLE_OR_COL > customerid > TOK_WHERE > TOK_FUNCTION > in > TOK_TABLE_OR_COL > customerid > 2 > 3 > TOK_TABLE_OR_COL > customerid > " + > > org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1098) > > at scala.sys.package$.error(package.scala:27) > at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:252) > at > org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50) > at > org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49) > at > scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) > {noformat} > [This > thread|http://apache-spark-user-list.1001560.n3.nabble.com/Subquery-in-having-clause-Spark-1-1-0-td17401.html] > also brings up lack of subquery support in SparkSQL. It would be nice to > have subquery predicate support in a near, future release (1.3, maybe?). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11042) Introduce a mechanism to ban creating new root SQLContexts in a JVM
Yin Huai created SPARK-11042: Summary: Introduce a mechanism to ban creating new root SQLContexts in a JVM Key: SPARK-11042 URL: https://issues.apache.org/jira/browse/SPARK-11042 Project: Spark Issue Type: New Feature Components: SQL Reporter: Yin Huai Assignee: Yin Huai For some use cases, it will be useful to explicitly ban creating multiple root SQLContexts/HiveContexts. At here root SQLContext means the first SQLContext that gets created. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11038) Consolidate the format of UnsafeArrayData and UnsafeMapData
Davies Liu created SPARK-11038: -- Summary: Consolidate the format of UnsafeArrayData and UnsafeMapData Key: SPARK-11038 URL: https://issues.apache.org/jira/browse/SPARK-11038 Project: Spark Issue Type: Improvement Reporter: Davies Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10930) History "Stages" page "duration" can be confusing
[ https://issues.apache.org/jira/browse/SPARK-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951265#comment-14951265 ] Apache Spark commented on SPARK-10930: -- User 'd2r' has created a pull request for this issue: https://github.com/apache/spark/pull/9051 > History "Stages" page "duration" can be confusing > - > > Key: SPARK-10930 > URL: https://issues.apache.org/jira/browse/SPARK-10930 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The spark history server, "stages" page shows each stage submitted time and > the duration. The duration can be confusing since the time it actually > starts tasks might be much later then its submitted if its waiting on > previous stages. This makes it hard to figure out which stages were really > slow without clicking into each stage. > It would be nice to perhaps have a first task launched time or processing > time spent in each stage to easily be able to find the slow stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10930) History "Stages" page "duration" can be confusing
[ https://issues.apache.org/jira/browse/SPARK-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10930: Assignee: Apache Spark > History "Stages" page "duration" can be confusing > - > > Key: SPARK-10930 > URL: https://issues.apache.org/jira/browse/SPARK-10930 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Apache Spark > > The spark history server, "stages" page shows each stage submitted time and > the duration. The duration can be confusing since the time it actually > starts tasks might be much later then its submitted if its waiting on > previous stages. This makes it hard to figure out which stages were really > slow without clicking into each stage. > It would be nice to perhaps have a first task launched time or processing > time spent in each stage to easily be able to find the slow stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11039) Document all UI "retained*" configurations
[ https://issues.apache.org/jira/browse/SPARK-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11039: Assignee: (was: Apache Spark) > Document all UI "retained*" configurations > -- > > Key: SPARK-11039 > URL: https://issues.apache.org/jira/browse/SPARK-11039 > Project: Spark > Issue Type: Documentation > Components: Documentation, Web UI >Affects Versions: 1.5.1 >Reporter: Nick Pritchard >Priority: Trivial > > Most are documented except these: > - spark.sql.ui.retainedExecutions > - spark.streaming.ui.retainedBatches > They are really helpful for managing the memory usage of the driver > application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11039) Document all UI "retained*" configurations
[ https://issues.apache.org/jira/browse/SPARK-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11039: Assignee: Apache Spark > Document all UI "retained*" configurations > -- > > Key: SPARK-11039 > URL: https://issues.apache.org/jira/browse/SPARK-11039 > Project: Spark > Issue Type: Documentation > Components: Documentation, Web UI >Affects Versions: 1.5.1 >Reporter: Nick Pritchard >Assignee: Apache Spark >Priority: Trivial > > Most are documented except these: > - spark.sql.ui.retainedExecutions > - spark.streaming.ui.retainedBatches > They are really helpful for managing the memory usage of the driver > application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10306) sbt hive/update issue
[ https://issues.apache.org/jira/browse/SPARK-10306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951362#comment-14951362 ] Jakob Odersky commented on SPARK-10306: --- Same issue here > sbt hive/update issue > - > > Key: SPARK-10306 > URL: https://issues.apache.org/jira/browse/SPARK-10306 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: holdenk >Priority: Trivial > > Running sbt hive/update sometimes results in the error "impossible to get > artifacts when data has not been loaded. IvyNode = > org.scala-lang#scala-library;2.10.3" which is unfortunate since it is always > evicted by 2.10.4 currently. An easy (but maybe not super clean) solution > would be adding 2.10.3 as a dependency which will then get evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9963) ML RandomForest cleanup: Move predictNodeIndex to LearningNode
[ https://issues.apache.org/jira/browse/SPARK-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949940#comment-14949940 ] Luvsandondov Lkhamsuren commented on SPARK-9963: Thanks for the tip. I fixed the original PR too. > ML RandomForest cleanup: Move predictNodeIndex to LearningNode > -- > > Key: SPARK-9963 > URL: https://issues.apache.org/jira/browse/SPARK-9963 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > (updated form original description) > Move ml.tree.impl.RandomForest.predictNodeIndex to LearningNode. > We need to keep it as a separate method from Node.predictImpl because (a) it > needs to operate on binned features and (b) it needs to return the node ID, > not the node (because it can return the ID for nodes which do not yet exist). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8658) AttributeReference equals method only compare name, exprId and dataType
[ https://issues.apache.org/jira/browse/SPARK-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949946#comment-14949946 ] Xiao Li commented on SPARK-8658: Hi, Michael and Antonio, Trying to understand the problem and fix it if I can. Expression ID is the same but their qualifiers are different??? Could you give a query sample? I am trying to reproduce the problem? Is this problem related to a self join? Thanks, Xiao Li > AttributeReference equals method only compare name, exprId and dataType > --- > > Key: SPARK-8658 > URL: https://issues.apache.org/jira/browse/SPARK-8658 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.0 >Reporter: Antonio Jesus Navarro > > The AttributeReference "equals" method only accept as different objects with > different name, expression id or dataType. With this behavior when I tried to > do a "transformExpressionsDown" and try to transform qualifiers inside > "AttributeReferences", these objects are not replaced, because the > transformer considers them equal. > I propose to add to the "equals" method this variables: > name, dataType, nullable, metadata, epxrId, qualifiers -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11022) Spark Worker process find Memory leak after long time running
[ https://issues.apache.org/jira/browse/SPARK-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin shaw updated SPARK-11022: --- Description: Worker process often down,while there were not any abnormal tasks,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. " and all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner. The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after long time running,crashed. was: Worker process often down,while there were not any abnormal task,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. " and all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner. The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after long time running,crashed. > Spark Worker process find Memory leak after long time running > - > > Key: SPARK-11022 > URL: https://issues.apache.org/jira/browse/SPARK-11022 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: colin shaw > > Worker process often down,while there were not any abnormal tasks,just crash > without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 > instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by > "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) > bytes. " > and all the instance were stored in a "org.apache.spark.deploy.worker.Worker" > instance, the finishedExecutors field hold many ExecutorRunner. > The codes(Worker.scala) shows finishedExecutors just > "finishedExecutors(fullId) = executor" and > "finishedExecutors.values.toList",there is no action which remove the > Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11022) Spark Worker process find Memory leaking after long time running
[ https://issues.apache.org/jira/browse/SPARK-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin shaw updated SPARK-11022: --- Summary: Spark Worker process find Memory leaking after long time running (was: Spark Worker process find Memory leak after long time running) > Spark Worker process find Memory leaking after long time running > > > Key: SPARK-11022 > URL: https://issues.apache.org/jira/browse/SPARK-11022 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: colin shaw > > Worker process often down,while there were not any abnormal tasks,just crash > without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 > instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by > "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) > bytes. " > and almost all the instance were stored in a > "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field > hold many ExecutorRunner. > The codes(Worker.scala) shows finishedExecutors just > "finishedExecutors(fullId) = executor" and > "finishedExecutors.values.toList",there is no action which remove the > Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11022) Spark Worker process find Memory leak after long time running
[ https://issues.apache.org/jira/browse/SPARK-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin shaw updated SPARK-11022: --- Description: Worker process often down,while there were not any abnormal tasks,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. " and almost all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner. The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after long time running,crashed. was: Worker process often down,while there were not any abnormal tasks,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. " and all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner. The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after long time running,crashed. > Spark Worker process find Memory leak after long time running > - > > Key: SPARK-11022 > URL: https://issues.apache.org/jira/browse/SPARK-11022 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: colin shaw > > Worker process often down,while there were not any abnormal tasks,just crash > without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 > instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by > "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) > bytes. " > and almost all the instance were stored in a > "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field > hold many ExecutorRunner. > The codes(Worker.scala) shows finishedExecutors just > "finishedExecutors(fullId) = executor" and > "finishedExecutors.values.toList",there is no action which remove the > Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11021) SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version
iward created SPARK-11021: - Summary: SparkSQL cli throws exception when using with Hive 0.12 metastore in spark-1.5.0 version Key: SPARK-11021 URL: https://issues.apache.org/jira/browse/SPARK-11021 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: iward After upgrade spark from 1.4.1 to 1.5.0,I get the following exception when I set set the following properties in spark-defaults.conf: {noformat} spark.sql.hive.metastore.version=0.12.0 spark.sql.hive.metastore.jars=hive 0.12 jars and hadoop jars {noformat} when I run a task,it got following exception: {noformat} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.sql.hive.client.Shim_v0_12.loadTable(HiveShim.scala:249) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply$mcV$sp(ClientWrapper.scala:489) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$loadTable$1.apply(ClientWrapper.scala:489) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:256) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:211) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:248) at org.apache.spark.sql.hive.client.ClientWrapper.loadTable(ClientWrapper.scala:488) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:243) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:127) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:263) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:927) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:927) at org.apache.spark.sql.DataFrame.(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.(DataFrame.scala:129) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:719) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:61) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:165) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move results from hdfs://ns1/user/dd_edw/warehouse/tmp/gdm_m10_afs_task_process_spark/.hive-staging_hive_2015-10-09_11-34-50_831_2280183503220873069-1/-ext-1 to destination directory: /user/dd_edw/warehouse/tmp/gdm_m10_afs_task_process_spark at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2303) at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:639) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1441) ... 40 more {noformat}
[jira] [Created] (SPARK-11022) Spark Worker process find Memory leak after long time running
colin shaw created SPARK-11022: -- Summary: Spark Worker process find Memory leak after long time running Key: SPARK-11022 URL: https://issues.apache.org/jira/browse/SPARK-11022 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Reporter: colin shaw Worker process often down,while there were not any abnormal task,just crash without anymessage, after added "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_HOME}/logs", a dump file show there is "17,010 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe2abfcc8" occupy 496,706,920 (96.14%) bytes. " and all the instance were stored in a "org.apache.spark.deploy.worker.Worker" instance, the finishedExecutors field hold many ExecutorRunner. The codes(Worker.scala) shows finishedExecutors just "finishedExecutors(fullId) = executor" and "finishedExecutors.values.toList",there is no action which remove the Executor,all were stored in memory,so after long time running,crashed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11028) When planning queries without partial aggregation support, we should try to use TungstenAggregate.
[ https://issues.apache.org/jira/browse/SPARK-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951210#comment-14951210 ] Josh Rosen commented on SPARK-11028: [~yhuai], if we fix SPARK-10992 first then will we still need to do this? Will it still be the case that _some_ HiveUDAFs don't support partial aggregation, requiring this? > When planning queries without partial aggregation support, we should try to > use TungstenAggregate. > -- > > Key: SPARK-11028 > URL: https://issues.apache.org/jira/browse/SPARK-11028 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > With SPARK-11017, we can run DeclarativeAggregate Functions in > TungstenAggregate. So, when we plan queries having functions that do not > support partial aggregation, we can use TungstenAggregate whenever possible. > The reason that we only use SortBasedAggregate is that HiveUDAF is the only > function that does not support partial aggregation and it is a > DeclarativeAggregate function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9241) Supporting multiple DISTINCT columns
[ https://issues.apache.org/jira/browse/SPARK-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951212#comment-14951212 ] Josh Rosen commented on SPARK-9241: --- [~yhuai], [~rxin], would you like to update this ticket based on recent discussions? > Supporting multiple DISTINCT columns > > > Key: SPARK-9241 > URL: https://issues.apache.org/jira/browse/SPARK-9241 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Priority: Critical > > Right now the new aggregation code path only support a single distinct column > (you can use it in multiple aggregate functions in the query). We need to > support multiple distinct columns by generating a different plan for handling > multiple distinct columns (without change aggregate functions). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10535) Support for recommendUsersForProducts and recommendProductsForUsers in matrix factorization model for PySpark
[ https://issues.apache.org/jira/browse/SPARK-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-10535. Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8700 [https://github.com/apache/spark/pull/8700] > Support for recommendUsersForProducts and recommendProductsForUsers in > matrix factorization model for PySpark > -- > > Key: SPARK-10535 > URL: https://issues.apache.org/jira/browse/SPARK-10535 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Affects Versions: 1.4.1, 1.5.0 >Reporter: Vladimir Vladimirov >Assignee: Vladimir Vladimirov > Fix For: 1.6.0 > > > Scala and Java API provides recommendUsersForProducts > recommendProductsForUsers methods, but PySpark MLlib API doesn't have them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10858) YARN: archives/jar/files rename with # doesn't work unless scheme given
[ https://issues.apache.org/jira/browse/SPARK-10858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-10858. Resolution: Fixed Fix Version/s: 1.6.0 1.5.2 > YARN: archives/jar/files rename with # doesn't work unless scheme given > --- > > Key: SPARK-10858 > URL: https://issues.apache.org/jira/browse/SPARK-10858 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > Fix For: 1.5.2, 1.6.0 > > > The YARN distributed cache feature with --jars, --archives, --files where you > can rename the file/archive using a # symbol only works if you explicitly > include the scheme in the path: > works: > --jars file:///home/foo/my.jar#renamed.jar > doesn't work: > --jars /home/foo/my.jar#renamed.jar > Exception in thread "main" java.io.FileNotFoundException: File > file:/home/foo/my.jar#renamed.jar does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:416) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) > at > org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:240) > at > org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:329) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:393) > at > org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6$$anonfun$apply$2.apply(Client.scala:392) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11037) Cleanup Option usage in JdbcUtils
Rick Hillegas created SPARK-11037: - Summary: Cleanup Option usage in JdbcUtils Key: SPARK-11037 URL: https://issues.apache.org/jira/browse/SPARK-11037 Project: Spark Issue Type: Task Components: SQL Affects Versions: 1.5.1 Reporter: Rick Hillegas Priority: Trivial The following issue came up in the review of the pull request for SPARK-10855 (https://github.com/apache/spark/pull/8982): We should use Option(...) instead of Some(...) because the former handles null arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10984) Simplify *MemoryManager class structure
[ https://issues.apache.org/jira/browse/SPARK-10984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951343#comment-14951343 ] Bowen Zhang commented on SPARK-10984: - [~andrewor14], sure, assign that to me. > Simplify *MemoryManager class structure > --- > > Key: SPARK-10984 > URL: https://issues.apache.org/jira/browse/SPARK-10984 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Andrew Or >Assignee: Josh Rosen > > This is a refactoring task. > After SPARK-10956 gets merged, we will have the following: > - MemoryManager > - StaticMemoryManager > - ExecutorMemoryManager > - TaskMemoryManager > - ShuffleMemoryManager > This is pretty confusing. The goal is to merge ShuffleMemoryManager and > ExecutorMemoryManager and move them into the top-level MemoryManager abstract > class. Then TaskMemoryManager should be renamed something else and used by > MemoryManager, such that the new hierarchy becomes: > - MemoryManager > - StaticMemoryManager -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11035) Launcher: allow apps to be launched in-process
Marcelo Vanzin created SPARK-11035: -- Summary: Launcher: allow apps to be launched in-process Key: SPARK-11035 URL: https://issues.apache.org/jira/browse/SPARK-11035 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 1.6.0 Reporter: Marcelo Vanzin The launcher library is currently restricted to launching apps as child processes. That is fine for a lot of cases, especially if the app is running in client mode. But in certain cases, especially launching in cluster mode, it's more efficient to avoid launching a new process, since that process won't be doing much. We should add support for launching apps in process, even if restricted to cluster mode at first. This will require some rework of the launch paths to avoid using system properties to propagate configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11039) Document all UI "retained*" configurations
Nick Pritchard created SPARK-11039: -- Summary: Document all UI "retained*" configurations Key: SPARK-11039 URL: https://issues.apache.org/jira/browse/SPARK-11039 Project: Spark Issue Type: Documentation Components: Documentation, Web UI Affects Versions: 1.5.1 Reporter: Nick Pritchard Priority: Trivial Most are documented except these: - spark.sql.ui.retainedExecutions - spark.streaming.ui.retainedBatches They are really helpful for managing the memory usage of the driver application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9241) Supporting multiple DISTINCT columns
[ https://issues.apache.org/jira/browse/SPARK-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951290#comment-14951290 ] Yin Huai commented on SPARK-9241: - Yeah. When we compile the query, we can split the queries with multiple distinct columns to multiple queries. Every query evaluates a single distinct aggregation. Then, we can join the results using the group by keys as the join keys. In the join, we need to use null safe equality as the condition. Right now, we need to have another optimization to make it work efficiently. Here is an example, {code} SELECT COUNT(DISTINCT a), COUNT(DISTINCT b), c FROM t GROUP BY c {code} will be rewritten to {code} SELECT x.count_a, y.count_b, x.c FROM (SELECT COUNT(DISTINCT a) count_a FROM t GROUP BY c) x JOIN (SELECT COUNT(DISTINCT b) count_b FROM t GROUP BY c) y ON coalesce(x.c, 0) = coalesce(y.c, 0) AND x.c <=> y.c {code} > Supporting multiple DISTINCT columns > > > Key: SPARK-9241 > URL: https://issues.apache.org/jira/browse/SPARK-9241 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Priority: Critical > > Right now the new aggregation code path only support a single distinct column > (you can use it in multiple aggregate functions in the query). We need to > support multiple distinct columns by generating a different plan for handling > multiple distinct columns (without change aggregate functions). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-8673) Launcher: add support for monitoring launched applications
[ https://issues.apache.org/jira/browse/SPARK-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-8673. - Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 7052 [https://github.com/apache/spark/pull/7052] > Launcher: add support for monitoring launched applications > -- > > Key: SPARK-8673 > URL: https://issues.apache.org/jira/browse/SPARK-8673 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Fix For: 1.6.0 > > > See parent bug for details. > This task covers adding the groundwork for being able to communicate with the > launched Spark application and provide ways for the code using the launcher > library to interact with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11017) Support ImperativeAggregates in TungstenAggregate
[ https://issues.apache.org/jira/browse/SPARK-11017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-11017: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-4366 > Support ImperativeAggregates in TungstenAggregate > - > > Key: SPARK-11017 > URL: https://issues.apache.org/jira/browse/SPARK-11017 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen > > The TungstenAggregate operator currently only supports DeclarativeAggregate > functions (i.e. expression-based aggregates); we should extend it to also > support ImperativeAggregate functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-11009: -- Assignee: Davies Liu > RowNumber in HiveContext returns negative values in cluster mode > > > Key: SPARK-11009 > URL: https://issues.apache.org/jira/browse/SPARK-11009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: Standalone cluster mode. No hadoop/hive is present in > the environment (no hive-site.xml), only using HiveContext. Spark build as > with hadoop 2.6.0. Default spark configuration variables. cluster has 4 > nodes, but happens with n nodes as well. >Reporter: Saif Addin Ellafi >Assignee: Davies Liu > > This issue happens when submitting the job into a standalone cluster. Have > not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 > does not fix the issue. Also tried having only one node in the cluster, with > same result. Other shuffle configuration changes do not alter the results > either. > The issue does NOT happen in --master local[*]. > val ws = Window. > partitionBy("client_id"). > orderBy("date") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).as(nm)) > > > df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > Long, DateType, Int > [219483904822,2006-06-01,-1863462909] > [219483904822,2006-09-01,-1863462909] > [219483904822,2007-01-01,-1863462909] > [219483904822,2007-08-01,-1863462909] > [219483904822,2007-07-01,-1863462909] > [192489238423,2007-07-01,-1863462774] > [192489238423,2007-02-01,-1863462774] > [192489238423,2006-11-01,-1863462774] > [192489238423,2006-08-01,-1863462774] > [192489238423,2007-08-01,-1863462774] > [192489238423,2006-09-01,-1863462774] > [192489238423,2007-03-01,-1863462774] > [192489238423,2006-10-01,-1863462774] > [192489238423,2007-05-01,-1863462774] > [192489238423,2006-06-01,-1863462774] > [192489238423,2006-12-01,-1863462774] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10988) Reduce duplication in Aggregate2's expression rewriting logic
[ https://issues.apache.org/jira/browse/SPARK-10988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10988: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-4366 > Reduce duplication in Aggregate2's expression rewriting logic > - > > Key: SPARK-10988 > URL: https://issues.apache.org/jira/browse/SPARK-10988 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.6.0 > > > In `aggregate/utils.scala`, there is a substantial amount of duplication in > the expression-rewriting logic. As a prerequisite to supporting imperative > aggregate functions in `TungstenAggregate`, we should refactor this file so > that the same expression-rewriting logic is used for both `SortAggregate` and > `TungstenAggregate`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10941) .Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity
[ https://issues.apache.org/jira/browse/SPARK-10941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10941: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-4366 > .Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve > code clarity > -- > > Key: SPARK-10941 > URL: https://issues.apache.org/jira/browse/SPARK-10941 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 1.6.0 > > > Spark SQL's new AlgebraicAggregate interface is confusingly named. > AlgebraicAggregate inherits from AggregateFunction2, adds a new set of > methods, then effectively bans the use of the inherited methods. This is > really confusing. I think that it's an anti-pattern / bad code smell if you > end up inheriting and wanting to remove methods inherited from the superclass. > I think that we should re-name this class and should refactor the class > hierarchy so that there's a clear distinction between which parts of the code > work with imperative aggregate functions vs. expression-based aggregates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4429) Build for Scala 2.11 using sbt fails.
[ https://issues.apache.org/jira/browse/SPARK-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951203#comment-14951203 ] Peter Halliday commented on SPARK-4429: --- I'm wondering where this is at? > Build for Scala 2.11 using sbt fails. > - > > Key: SPARK-4429 > URL: https://issues.apache.org/jira/browse/SPARK-4429 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin > Fix For: 1.2.0 > > > I tried to build for Scala 2.11 using sbt with the following command: > {quote} > $ sbt/sbt -Dscala-2.11 assembly > {quote} > but it ends with the following error messages: > {quote} > \[error\] (streaming-kafka/*:update) sbt.ResolveException: unresolved > dependency: org.apache.kafka#kafka_2.11;0.8.0: not found > \[error\] (catalyst/*:update) sbt.ResolveException: unresolved dependency: > org.scalamacros#quasiquotes_2.11;2.0.1: not found > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException
[ https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951238#comment-14951238 ] Charles Allen commented on SPARK-8142: -- I had a similar failure as topic and solved it by setting "spark.executor.userClassPathFirst" to "false" and "spark.driver.userClassPathFirst" to "false" > Spark Job Fails with ResultTask ClassCastException > -- > > Key: SPARK-8142 > URL: https://issues.apache.org/jira/browse/SPARK-8142 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1 >Reporter: Dev Lakhani > > When running a Spark Job, I get no failures in the application code > whatsoever but a weird ResultTask Class exception. In my job, I create a RDD > from HBase and for each partition do a REST call on an API, using a REST > client. This has worked in IntelliJ but when I deploy to a cluster using > spark-submit.sh I get : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, host): java.lang.ClassCastException: > org.apache.spark.scheduler.ResultTask cannot be cast to > org.apache.spark.scheduler.Task > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > These are the configs I set to override the spark classpath because I want to > use my own glassfish jersey version: > > sparkConf.set("spark.driver.userClassPathFirst","true"); > sparkConf.set("spark.executor.userClassPathFirst","true"); > I see no other warnings or errors in any of the logs. > Unfortunately I cannot post my code, but please ask me questions that will > help debug the issue. Using spark 1.3.1 hadoop 2.6. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10930) History "Stages" page "duration" can be confusing
[ https://issues.apache.org/jira/browse/SPARK-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10930: Assignee: (was: Apache Spark) > History "Stages" page "duration" can be confusing > - > > Key: SPARK-10930 > URL: https://issues.apache.org/jira/browse/SPARK-10930 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Thomas Graves > > The spark history server, "stages" page shows each stage submitted time and > the duration. The duration can be confusing since the time it actually > starts tasks might be much later then its submitted if its waiting on > previous stages. This makes it hard to figure out which stages were really > slow without clicking into each stage. > It would be nice to perhaps have a first task launched time or processing > time spent in each stage to easily be able to find the slow stages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11039) Document all UI "retained*" configurations
[ https://issues.apache.org/jira/browse/SPARK-11039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951282#comment-14951282 ] Apache Spark commented on SPARK-11039: -- User 'pnpritchard' has created a pull request for this issue: https://github.com/apache/spark/pull/9052 > Document all UI "retained*" configurations > -- > > Key: SPARK-11039 > URL: https://issues.apache.org/jira/browse/SPARK-11039 > Project: Spark > Issue Type: Documentation > Components: Documentation, Web UI >Affects Versions: 1.5.1 >Reporter: Nick Pritchard >Priority: Trivial > > Most are documented except these: > - spark.sql.ui.retainedExecutions > - spark.streaming.ui.retainedBatches > They are really helpful for managing the memory usage of the driver > application. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11040) SaslRpcHandler does not delegate all methods to underlying handler
Marcelo Vanzin created SPARK-11040: -- Summary: SaslRpcHandler does not delegate all methods to underlying handler Key: SPARK-11040 URL: https://issues.apache.org/jira/browse/SPARK-11040 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Marcelo Vanzin {{SaslRpcHandler}} only delegates {{receive}} and {{getStreamManager}}, so when SASL is enabled, other events will be missed by apps. This affects other version too, but I think these events aren't actually used there. They'll be used by the new rpc backend in 1.6, though. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10855) Add a JDBC dialect for Apache Derby
[ https://issues.apache.org/jira/browse/SPARK-10855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-10855. - Resolution: Fixed Assignee: Rick Hillegas Fix Version/s: 1.6.0 > Add a JDBC dialect for Apache Derby > > > Key: SPARK-10855 > URL: https://issues.apache.org/jira/browse/SPARK-10855 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Rick Hillegas >Assignee: Rick Hillegas >Priority: Minor > Fix For: 1.6.0 > > > In particular, it would be good if the dialect could handle Derby's > user-defined types. The following script fails: > {noformat} > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > // the following script was used to create a Derby table > // which has a column of user-defined type: > // > // create type properties external name 'java.util.Properties' language java; > // > // create function systemProperties() returns properties > // language java parameter style java no sql > // external name 'java.lang.System.getProperties'; > // > // create table propertiesTable( props properties ); > // > // insert into propertiesTable values ( null ), ( systemProperties() ); > // > // select * from propertiesTable; > // cannot handle a table which has a column of type > java.sql.Types.JAVA_OBJECT: > // > // java.sql.SQLException: Unsupported type 2000 > // > val df = sqlContext.read.format("jdbc").options( > Map("url" -> "jdbc:derby:/Users/rhillegas/derby/databases/derby1", > "dbtable" -> "app.propertiesTable")).load() > // shutdown the Derby engine > val shutdown = sqlContext.read.format("jdbc").options( > Map("url" -> "jdbc:derby:;shutdown=true", > "dbtable" -> "")).load() > exit() > {noformat} > The inability to handle user-defined types probably affects other databases > besides Derby. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11036) AttributeReference should not be created outside driver
Davies Liu created SPARK-11036: -- Summary: AttributeReference should not be created outside driver Key: SPARK-11036 URL: https://issues.apache.org/jira/browse/SPARK-11036 Project: Spark Issue Type: Improvement Components: SQL Reporter: Davies Liu if AttributeReference is created in executor, the id could be the same as others created in driver. We should have a way to ban that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11009: Assignee: Apache Spark > RowNumber in HiveContext returns negative values in cluster mode > > > Key: SPARK-11009 > URL: https://issues.apache.org/jira/browse/SPARK-11009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: Standalone cluster mode. No hadoop/hive is present in > the environment (no hive-site.xml), only using HiveContext. Spark build as > with hadoop 2.6.0. Default spark configuration variables. cluster has 4 > nodes, but happens with n nodes as well. >Reporter: Saif Addin Ellafi >Assignee: Apache Spark > > This issue happens when submitting the job into a standalone cluster. Have > not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 > does not fix the issue. Also tried having only one node in the cluster, with > same result. Other shuffle configuration changes do not alter the results > either. > The issue does NOT happen in --master local[*]. > val ws = Window. > partitionBy("client_id"). > orderBy("date") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).as(nm)) > > > df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > Long, DateType, Int > [219483904822,2006-06-01,-1863462909] > [219483904822,2006-09-01,-1863462909] > [219483904822,2007-01-01,-1863462909] > [219483904822,2007-08-01,-1863462909] > [219483904822,2007-07-01,-1863462909] > [192489238423,2007-07-01,-1863462774] > [192489238423,2007-02-01,-1863462774] > [192489238423,2006-11-01,-1863462774] > [192489238423,2006-08-01,-1863462774] > [192489238423,2007-08-01,-1863462774] > [192489238423,2006-09-01,-1863462774] > [192489238423,2007-03-01,-1863462774] > [192489238423,2006-10-01,-1863462774] > [192489238423,2007-05-01,-1863462774] > [192489238423,2006-06-01,-1863462774] > [192489238423,2006-12-01,-1863462774] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951202#comment-14951202 ] Apache Spark commented on SPARK-11009: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/9050 > RowNumber in HiveContext returns negative values in cluster mode > > > Key: SPARK-11009 > URL: https://issues.apache.org/jira/browse/SPARK-11009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: Standalone cluster mode. No hadoop/hive is present in > the environment (no hive-site.xml), only using HiveContext. Spark build as > with hadoop 2.6.0. Default spark configuration variables. cluster has 4 > nodes, but happens with n nodes as well. >Reporter: Saif Addin Ellafi > > This issue happens when submitting the job into a standalone cluster. Have > not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 > does not fix the issue. Also tried having only one node in the cluster, with > same result. Other shuffle configuration changes do not alter the results > either. > The issue does NOT happen in --master local[*]. > val ws = Window. > partitionBy("client_id"). > orderBy("date") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).as(nm)) > > > df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > Long, DateType, Int > [219483904822,2006-06-01,-1863462909] > [219483904822,2006-09-01,-1863462909] > [219483904822,2007-01-01,-1863462909] > [219483904822,2007-08-01,-1863462909] > [219483904822,2007-07-01,-1863462909] > [192489238423,2007-07-01,-1863462774] > [192489238423,2007-02-01,-1863462774] > [192489238423,2006-11-01,-1863462774] > [192489238423,2006-08-01,-1863462774] > [192489238423,2007-08-01,-1863462774] > [192489238423,2006-09-01,-1863462774] > [192489238423,2007-03-01,-1863462774] > [192489238423,2006-10-01,-1863462774] > [192489238423,2007-05-01,-1863462774] > [192489238423,2006-06-01,-1863462774] > [192489238423,2006-12-01,-1863462774] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-11009: Assignee: (was: Apache Spark) > RowNumber in HiveContext returns negative values in cluster mode > > > Key: SPARK-11009 > URL: https://issues.apache.org/jira/browse/SPARK-11009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: Standalone cluster mode. No hadoop/hive is present in > the environment (no hive-site.xml), only using HiveContext. Spark build as > with hadoop 2.6.0. Default spark configuration variables. cluster has 4 > nodes, but happens with n nodes as well. >Reporter: Saif Addin Ellafi > > This issue happens when submitting the job into a standalone cluster. Have > not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 > does not fix the issue. Also tried having only one node in the cluster, with > same result. Other shuffle configuration changes do not alter the results > either. > The issue does NOT happen in --master local[*]. > val ws = Window. > partitionBy("client_id"). > orderBy("date") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).as(nm)) > > > df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > Long, DateType, Int > [219483904822,2006-06-01,-1863462909] > [219483904822,2006-09-01,-1863462909] > [219483904822,2007-01-01,-1863462909] > [219483904822,2007-08-01,-1863462909] > [219483904822,2007-07-01,-1863462909] > [192489238423,2007-07-01,-1863462774] > [192489238423,2007-02-01,-1863462774] > [192489238423,2006-11-01,-1863462774] > [192489238423,2006-08-01,-1863462774] > [192489238423,2007-08-01,-1863462774] > [192489238423,2006-09-01,-1863462774] > [192489238423,2007-03-01,-1863462774] > [192489238423,2006-10-01,-1863462774] > [192489238423,2007-05-01,-1863462774] > [192489238423,2006-06-01,-1863462774] > [192489238423,2006-12-01,-1863462774] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10167) We need to explicitly use transformDown when rewrite aggregation results
[ https://issues.apache.org/jira/browse/SPARK-10167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10167. Resolution: Fixed Assignee: Josh Rosen Fix Version/s: 1.6.0 I changed {{transform}} to {{transformDown}} as part of my refactorings in SPARK-10988, so I'm going to mark this as resolved. > We need to explicitly use transformDown when rewrite aggregation results > > > Key: SPARK-10167 > URL: https://issues.apache.org/jira/browse/SPARK-10167 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Josh Rosen >Priority: Minor > Fix For: 1.6.0 > > > Right now, we use transformDown explicitly at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L105 > and > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L130. > We also need to be very clear on using transformDown at > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L300 > and > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala#L334 > (right now transform means transformDown). The reason we need to use > transformDown is when we rewrite final aggregate results, we should always > match aggregate functions first. If we use transformUp, it is possible that > we match grouping expression first if we use grouping expressions as children > of aggregate functions. > There is nothing wrong with our master. We just want to make sure we will not > have bugs if we change the behavior of transform (change it from > transformDown to Up.), which I think is very unlikely (but just incase). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11009) RowNumber in HiveContext returns negative values in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-11009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-11009: - Target Version/s: 1.5.2, 1.6.0 Priority: Blocker (was: Major) > RowNumber in HiveContext returns negative values in cluster mode > > > Key: SPARK-11009 > URL: https://issues.apache.org/jira/browse/SPARK-11009 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 > Environment: Standalone cluster mode. No hadoop/hive is present in > the environment (no hive-site.xml), only using HiveContext. Spark build as > with hadoop 2.6.0. Default spark configuration variables. cluster has 4 > nodes, but happens with n nodes as well. >Reporter: Saif Addin Ellafi >Assignee: Davies Liu >Priority: Blocker > > This issue happens when submitting the job into a standalone cluster. Have > not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 > does not fix the issue. Also tried having only one node in the cluster, with > same result. Other shuffle configuration changes do not alter the results > either. > The issue does NOT happen in --master local[*]. > val ws = Window. > partitionBy("client_id"). > orderBy("date") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).as(nm)) > > > df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > Long, DateType, Int > [219483904822,2006-06-01,-1863462909] > [219483904822,2006-09-01,-1863462909] > [219483904822,2007-01-01,-1863462909] > [219483904822,2007-08-01,-1863462909] > [219483904822,2007-07-01,-1863462909] > [192489238423,2007-07-01,-1863462774] > [192489238423,2007-02-01,-1863462774] > [192489238423,2006-11-01,-1863462774] > [192489238423,2006-08-01,-1863462774] > [192489238423,2007-08-01,-1863462774] > [192489238423,2006-09-01,-1863462774] > [192489238423,2007-03-01,-1863462774] > [192489238423,2006-10-01,-1863462774] > [192489238423,2007-05-01,-1863462774] > [192489238423,2006-06-01,-1863462774] > [192489238423,2006-12-01,-1863462774] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10970) Executors overload Hive metastore by making massive connections at execution time
[ https://issues.apache.org/jira/browse/SPARK-10970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved SPARK-10970. --- Resolution: Fixed Closing the jira because this is fixed by SPARK-10679. SPARK-10679 addresses a different issue, but it also fixes this issue as a byproduct. > Executors overload Hive metastore by making massive connections at execution > time > - > > Key: SPARK-10970 > URL: https://issues.apache.org/jira/browse/SPARK-10970 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.1 > Environment: Hive 1.2, Spark on YARN >Reporter: Cheolsoo Park >Priority: Critical > > This is a regression in Spark 1.5, more specifically after upgrading Hive > dependency to 1.2. > HIVE-2573 introduced a new feature that allows users to register functions in > session. The problem is that it added a [static code > block|https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L164-L170] > to Hive.java- > {code} > // register all permanent functions. need improvement > static { > try { > reloadFunctions(); > } catch (Exception e) { > LOG.warn("Failed to access metastore. This class should not accessed in > runtime.",e); > } > } > {code} > This code block is executed by every Spark executor in cluster when HadoopRDD > tries to access to JobConf. So if Spark job has a high parallelism (eg > 1000+), executors will hammer the HCat server causing it to go down in the > worst case. > Here is the stack trace that I took in executor when it makes a connection to > Hive metastore- > {code} > 15/10/06 19:26:05 WARN conf.HiveConf: HiveConf of name hive.optimize.s3.query > does not exist > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.lang.Thread.getStackTrace(Thread.java:1589) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:236) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:347) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$anonfun$17.apply(TableReader.scala:322) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.sql.hive.HadoopTableReader$anonfun$17.apply(TableReader.scala:322) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:179) > 15/10/06 19:26:05 INFO hive.metastore: XXX: > org.apache.spark.rdd.HadoopRDD$anonfun$getJobConf$6.apply(HadoopRDD.scala:179) >
[jira] [Commented] (SPARK-11013) SparkPlan may mistakenly register child plan's accumulators for SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951606#comment-14951606 ] Shixiong Zhu commented on SPARK-11013: -- I see. So we implement something like {{LongMinAccumulableParam}}, we can use `stringValue` to display "-" for {{None}}. What do you think? > SparkPlan may mistakenly register child plan's accumulators for SQL metrics > --- > > Key: SPARK-11013 > URL: https://issues.apache.org/jira/browse/SPARK-11013 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan > > The reason is that: when we call RDD API inside SparkPlan, we are very likely > to reference the SparkPlan in the closure and thus serialize and transfer a > SparkPlan tree to executor side. When we deserialize it, the accumulators in > child SparkPlan are also deserialized and registered, and always report zero > value. > This is not a problem currently because we only have one operation to > aggregate the accumulators: add. However, if we wanna support more complex > metric like min, the extra zero values will lead to wrong result. > Take TungstenAggregate as an example, I logged "stageId, partitionId, > accumName, accumId" when an accumulator is deserialized and registered, and > logged the "accumId -> accumValue" map when a task ends. The output is: > {code} > scala> val df = Seq(1 -> "a", 2 -> "b").toDF("a", "b").groupBy().count() > df: org.apache.spark.sql.DataFrame = [count: bigint] > scala> df.collect > register: 0 0 Some(number of input rows) 4 > register: 0 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 4 > register: 1 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 2 > register: 1 0 Some(number of output rows) 3 > Map(5 -> 1, 4 -> 2, 6 -> 4458496) > Map(5 -> 0, 2 -> 1, 7 -> 4458496, 3 -> 1, 4 -> 0) > res0: Array[org.apache.spark.sql.Row] = Array([2]) > {code} > The best choice is to avoid serialize and deserialize a SparkPlan tree, which > can be achieved by LocalNode. > Or we can do some workaround to fix this serialization problem for the > problematic SparkPlans like TungstenAggregate, TungstenSort. > Or we can improve the SQL metrics framework to make it more robust to this > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-11013) SparkPlan may mistakenly register child plan's accumulators for SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951606#comment-14951606 ] Shixiong Zhu edited comment on SPARK-11013 at 10/10/15 5:12 AM: I see. So if we implement something like {{LongMinAccumulableParam}}, we can use `stringValue` to display "-" for {{None}}. What do you think? was (Author: zsxwing): I see. So we implement something like {{LongMinAccumulableParam}}, we can use `stringValue` to display "-" for {{None}}. What do you think? > SparkPlan may mistakenly register child plan's accumulators for SQL metrics > --- > > Key: SPARK-11013 > URL: https://issues.apache.org/jira/browse/SPARK-11013 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Wenchen Fan > > The reason is that: when we call RDD API inside SparkPlan, we are very likely > to reference the SparkPlan in the closure and thus serialize and transfer a > SparkPlan tree to executor side. When we deserialize it, the accumulators in > child SparkPlan are also deserialized and registered, and always report zero > value. > This is not a problem currently because we only have one operation to > aggregate the accumulators: add. However, if we wanna support more complex > metric like min, the extra zero values will lead to wrong result. > Take TungstenAggregate as an example, I logged "stageId, partitionId, > accumName, accumId" when an accumulator is deserialized and registered, and > logged the "accumId -> accumValue" map when a task ends. The output is: > {code} > scala> val df = Seq(1 -> "a", 2 -> "b").toDF("a", "b").groupBy().count() > df: org.apache.spark.sql.DataFrame = [count: bigint] > scala> df.collect > register: 0 0 Some(number of input rows) 4 > register: 0 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 4 > register: 1 0 Some(number of output rows) 5 > register: 1 0 Some(number of input rows) 2 > register: 1 0 Some(number of output rows) 3 > Map(5 -> 1, 4 -> 2, 6 -> 4458496) > Map(5 -> 0, 2 -> 1, 7 -> 4458496, 3 -> 1, 4 -> 0) > res0: Array[org.apache.spark.sql.Row] = Array([2]) > {code} > The best choice is to avoid serialize and deserialize a SparkPlan tree, which > can be achieved by LocalNode. > Or we can do some workaround to fix this serialization problem for the > problematic SparkPlans like TungstenAggregate, TungstenSort. > Or we can improve the SQL metrics framework to make it more robust to this > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10927) Spark history uses the application name instead of the ID
[ https://issues.apache.org/jira/browse/SPARK-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré resolved SPARK-10927. -- Resolution: Duplicate > Spark history uses the application name instead of the ID > - > > Key: SPARK-10927 > URL: https://issues.apache.org/jira/browse/SPARK-10927 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.1 >Reporter: Jean-Baptiste Onofré > > Setting spark.eventLog.enabled to true, and a folder location for > spark.eventLog.dir provides the history UI for completed jobs. > It works fine for jobs without arguments, but if the job expects some > arguments (like JavaWordCount which expects the source file location), the UI > is not possible to provide application details: > {code} > Application history not found (app-20151005185136-0002) > No event logs found for application JavaWordCount in file:/tmp/spark. Did you > specify the correct logging directory? > {code} > However, in /tmp/spark, the file app-20151005185136-0002 is there. It seems > that the UI uses the application name (JavaWordCount) instead of the > application ID (app-20151005185136-0002) to get history details. > I will work on a fix around that. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
[ https://issues.apache.org/jira/browse/SPARK-6847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951593#comment-14951593 ] Jack Hu commented on SPARK-6847: Hi [~glyton.camilleri] You can check whether there are two dstreams in the DAG need to be checkpointed (updateStateByKey, reduceByKeyAndWindow), it yes, you can workaround this to use some output for the previous DStream which needs to checkpointed. {code} val d1 = input.updateStateByKey(func) val d2 = d1.map(...).updateStateByKey(func) d2.foreachRDD(rdd => print(rdd.count)) /// workaround the stack over flow listed in this JIRA d1.foreachRDD(rdd => rdd.foreach(_ => Unit)) {code} > Stack overflow on updateStateByKey which followed by a dstream with > checkpoint set > -- > > Key: SPARK-6847 > URL: https://issues.apache.org/jira/browse/SPARK-6847 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.0 >Reporter: Jack Hu > Labels: StackOverflowError, Streaming > > The issue happens with the following sample code: uses {{updateStateByKey}} > followed by a {{map}} with checkpoint interval 10 seconds > {code} > val sparkConf = new SparkConf().setAppName("test") > val streamingContext = new StreamingContext(sparkConf, Seconds(10)) > streamingContext.checkpoint("""checkpoint""") > val source = streamingContext.socketTextStream("localhost", ) > val updatedResult = source.map( > (1,_)).updateStateByKey( > (newlist : Seq[String], oldstate : Option[String]) => > newlist.headOption.orElse(oldstate)) > updatedResult.map(_._2) > .checkpoint(Seconds(10)) > .foreachRDD((rdd, t) => { > println("Deep: " + rdd.toDebugString.split("\n").length) > println(t.toString() + ": " + rdd.collect.length) > }) > streamingContext.start() > streamingContext.awaitTermination() > {code} > From the output, we can see that the dependency will be increasing time over > time, the {{updateStateByKey}} never get check-pointed, and finally, the > stack overflow will happen. > Note: > * The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but > not the {{updateStateByKey}} > * If remove the {{checkpoint(Seconds(10))}} from the map result ( > {{updatedResult.map(_._2)}} ), the stack overflow will not happen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6613) Starting stream from checkpoint causes Streaming tab to throw error
[ https://issues.apache.org/jira/browse/SPARK-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951610#comment-14951610 ] Yutao SUN commented on SPARK-6613: -- Same issue in 1.5.0 > Starting stream from checkpoint causes Streaming tab to throw error > --- > > Key: SPARK-6613 > URL: https://issues.apache.org/jira/browse/SPARK-6613 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.2.1, 1.2.2, 1.3.1 >Reporter: Marius Soutier > > When continuing my streaming job from a checkpoint, the job runs, but the > Streaming tab in the standard UI initially no longer works (browser just > shows HTTP ERROR: 500). Sometimes it gets back to normal after a while, and > sometimes it stays in this state permanently. > Stacktrace: > WARN org.eclipse.jetty.servlet.ServletHandler: /streaming/ > java.util.NoSuchElementException: key not found: 0 > at scala.collection.MapLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:58) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at scala.collection.AbstractMap.apply(Map.scala:58) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:151) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1$$anonfun$apply$5.apply(StreamingJobProgressListener.scala:150) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.Range.foreach(Range.scala:141) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:150) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener$$anonfun$lastReceivedBatchRecords$1.apply(StreamingJobProgressListener.scala:149) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.lastReceivedBatchRecords(StreamingJobProgressListener.scala:149) > at > org.apache.spark.streaming.ui.StreamingPage.generateReceiverStats(StreamingPage.scala:82) > at > org.apache.spark.streaming.ui.StreamingPage.render(StreamingPage.scala:43) > at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) > at org.apache.spark.ui.WebUI$$anonfun$attachPage$1.apply(WebUI.scala:68) > at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:68) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:370) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) > at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) > at > org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) > at > org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at