[jira] [Commented] (SPARK-10969) Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis and DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266430#comment-16266430 ] Christoph Pirkl commented on SPARK-10969: - Thanks for the information, this solves my problem. You can close the ticket. > Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis > and DynamoDB > --- > > Key: SPARK-10969 > URL: https://issues.apache.org/jira/browse/SPARK-10969 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 1.5.1 >Reporter: Christoph Pirkl >Priority: Critical > > {{KinesisUtils.createStream()}} allows specifying only one set of AWS > credentials that will be used by Amazon KCL for accessing Kinesis, DynamoDB > and CloudWatch. > h5. Motivation > In a scenario where one needs to read from a Kinesis Stream owned by a > different AWS account the user usually has minimal rights (i.e. only read > from the stream). In this case creating the DynamoDB table in KCL will fail. > h5. Proposal > My proposed solution would be to allow specifying multiple credentials in > {{KinesisUtils.createStream()}} for Kinesis, DynamoDB and CloudWatch. The > additional credentials could then be passed to the constructor of > {{KinesisClientLibConfiguration}} or method > {{KinesisClientLibConfiguration.withDynamoDBClientConfig()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22602) remove ColumnVector#loadBytes
[ https://issues.apache.org/jira/browse/SPARK-22602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22602. - Resolution: Fixed Fix Version/s: 2.3.0 > remove ColumnVector#loadBytes > - > > Key: SPARK-22602 > URL: https://issues.apache.org/jira/browse/SPARK-22602 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22611) Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
[ https://issues.apache.org/jira/browse/SPARK-22611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266389#comment-16266389 ] Richard Moorhead commented on SPARK-22611: -- Thats correct. The Spark Streaming UI shows a greater number than the actual number of records instantiated by the KinesisBackedBlockRDD during actions invoked in `foreachRdd`. It appears that individual sequence numbers are skipped when throughput exceptions occur. > Spark Kinesis ProvisionedThroughputExceededException leads to dropped records > - > > Key: SPARK-22611 > URL: https://issues.apache.org/jira/browse/SPARK-22611 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.2.0 >Reporter: Richard Moorhead > > Ive loaded a Kinesis stream with a single shard with ~20M records and have > created a simple spark streaming application that writes those records to s3. > When the streaming interval is set sufficiently wide such that 2MB/s read > rates are violated, the receiver's KCL processes throw > ProvisionedThroughputExceededExceptions. While these exceptions are expected, > the output record counts in s3 do not match the record counts in the Spark > Streaming UI and worse, the records never appear to be fetched in future > batches. This problem can be mitigated by setting the streaming interval to a > narrow window such that batches are small enough that throughput limits arent > exceeded but this isnt guaranteed in a production system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10969) Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis and DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266374#comment-16266374 ] Grega Kespret edited comment on SPARK-10969 at 11/27/17 5:33 AM: - I believe this is now possible through the {{KinesisInputDStream.Builder}} class added in SPARK-19911 as: {code} KinesisInputDStream.builder ... .kinesisCredentials(creds1) .dynamoDBCredentials(creds2) .cloudWatchCredentials(creds3) .build {code} Close the ticket? cc [~brkyvz] was (Author: gregak): I believe this is now possible through the {{KinesisInputDStream.Builder}} class added in SPARK-19911 as: {code} KinesisInputDStream.builder ... .kinesisCredentials(creds1) .dynamoDBCredentials(creds2) .cloudWatchCredentials(creds3) .build {code} Close the ticket? > Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis > and DynamoDB > --- > > Key: SPARK-10969 > URL: https://issues.apache.org/jira/browse/SPARK-10969 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 1.5.1 >Reporter: Christoph Pirkl >Priority: Critical > > {{KinesisUtils.createStream()}} allows specifying only one set of AWS > credentials that will be used by Amazon KCL for accessing Kinesis, DynamoDB > and CloudWatch. > h5. Motivation > In a scenario where one needs to read from a Kinesis Stream owned by a > different AWS account the user usually has minimal rights (i.e. only read > from the stream). In this case creating the DynamoDB table in KCL will fail. > h5. Proposal > My proposed solution would be to allow specifying multiple credentials in > {{KinesisUtils.createStream()}} for Kinesis, DynamoDB and CloudWatch. The > additional credentials could then be passed to the constructor of > {{KinesisClientLibConfiguration}} or method > {{KinesisClientLibConfiguration.withDynamoDBClientConfig()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10969) Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis and DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266374#comment-16266374 ] Grega Kespret edited comment on SPARK-10969 at 11/27/17 5:31 AM: - I believe this is now possible through the {{KinesisInputDStream.Builder}} class added in SPARK-19911 as: {code} KinesisInputDStream.builder ... .kinesisCredentials(creds1) .dynamoDBCredentials(creds2) .cloudWatchCredentials(creds3) .build {code} Close the ticket? was (Author: gregak): I believe this is now possible through the {{KinesisInputDStream.Builder}} class as: {code} KinesisInputDStream.builder ... .kinesisCredentials(creds1) .dynamoDBCredentials(creds2) .cloudWatchCredentials(creds3) .build {code} Close the ticket? > Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis > and DynamoDB > --- > > Key: SPARK-10969 > URL: https://issues.apache.org/jira/browse/SPARK-10969 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 1.5.1 >Reporter: Christoph Pirkl >Priority: Critical > > {{KinesisUtils.createStream()}} allows specifying only one set of AWS > credentials that will be used by Amazon KCL for accessing Kinesis, DynamoDB > and CloudWatch. > h5. Motivation > In a scenario where one needs to read from a Kinesis Stream owned by a > different AWS account the user usually has minimal rights (i.e. only read > from the stream). In this case creating the DynamoDB table in KCL will fail. > h5. Proposal > My proposed solution would be to allow specifying multiple credentials in > {{KinesisUtils.createStream()}} for Kinesis, DynamoDB and CloudWatch. The > additional credentials could then be passed to the constructor of > {{KinesisClientLibConfiguration}} or method > {{KinesisClientLibConfiguration.withDynamoDBClientConfig()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10969) Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis and DynamoDB
[ https://issues.apache.org/jira/browse/SPARK-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266374#comment-16266374 ] Grega Kespret commented on SPARK-10969: --- I believe this is now possible through the {{KinesisInputDStream.Builder}} class as: {code} KinesisInputDStream.builder ... .kinesisCredentials(creds1) .dynamoDBCredentials(creds2) .cloudWatchCredentials(creds3) .build {code} Close the ticket? > Spark Streaming Kinesis: Allow specifying separate credentials for Kinesis > and DynamoDB > --- > > Key: SPARK-10969 > URL: https://issues.apache.org/jira/browse/SPARK-10969 > Project: Spark > Issue Type: Improvement > Components: DStreams >Affects Versions: 1.5.1 >Reporter: Christoph Pirkl >Priority: Critical > > {{KinesisUtils.createStream()}} allows specifying only one set of AWS > credentials that will be used by Amazon KCL for accessing Kinesis, DynamoDB > and CloudWatch. > h5. Motivation > In a scenario where one needs to read from a Kinesis Stream owned by a > different AWS account the user usually has minimal rights (i.e. only read > from the stream). In this case creating the DynamoDB table in KCL will fail. > h5. Proposal > My proposed solution would be to allow specifying multiple credentials in > {{KinesisUtils.createStream()}} for Kinesis, DynamoDB and CloudWatch. The > additional credentials could then be passed to the constructor of > {{KinesisClientLibConfiguration}} or method > {{KinesisClientLibConfiguration.withDynamoDBClientConfig()}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7721) Generate test coverage report from Python
[ https://issues.apache.org/jira/browse/SPARK-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266358#comment-16266358 ] Reynold Xin commented on SPARK-7721: This is really cool. I took a look but it looks like doctests are missing? For example, sortWithinPartitions is labeled as missing, but there is doctest for that. > Generate test coverage report from Python > - > > Key: SPARK-7721 > URL: https://issues.apache.org/jira/browse/SPARK-7721 > Project: Spark > Issue Type: Test > Components: PySpark, Tests >Reporter: Reynold Xin > > Would be great to have test coverage report for Python. Compared with Scala, > it is tricker to understand the coverage without coverage reports in Python > because we employ both docstring tests and unit tests in test files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22612) NullPointerException in AppendOnlyMap
Lijie Xu created SPARK-22612: Summary: NullPointerException in AppendOnlyMap Key: SPARK-22612 URL: https://issues.apache.org/jira/browse/SPARK-22612 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.2, 2.1.1 Reporter: Lijie Xu I recently encounter a NullPointerException in AppendOnlyMap while running the SparkPageRank example in the package org.apache.spark.examples. {code:java} 17/11/25 16:31:13 ERROR Executor: Exception in task 30.0 in stage 10.0 (TID 417) java.lang.NullPointerException at scala.Tuple2.equals(Tuple2.scala:20) at org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:149) at org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/11/25 16:31:13 INFO Executor: Executor is trying to kill task 24.0 in stage 10.0 (TID 409) {code} The corresponding AppendOnlyMap code is as follows. {code:java} while (true) { val curKey = data(2 * pos) if (curKey.eq(null)) { val newValue = updateFunc(false, null.asInstanceOf[V]) data(2 * pos) = k data(2 * pos + 1) = newValue.asInstanceOf[AnyRef] incrementSize() return newValue } else if (k.eq(curKey) || k.equals(curKey)) { // NullPointerException in this line (149) val newValue = updateFunc(true, data(2 * pos + 1).asInstanceOf[V]) data(2 * pos + 1) = newValue.asInstanceOf[AnyRef] return newValue } else { val delta = i pos = (pos + delta) & mask i += 1 } } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22611) Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
[ https://issues.apache.org/jira/browse/SPARK-22611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266333#comment-16266333 ] Sean Owen commented on SPARK-22611: --- What's the Spark issue here -- somehow the error causes the records to be considered processed when they didn't make it to user code? > Spark Kinesis ProvisionedThroughputExceededException leads to dropped records > - > > Key: SPARK-22611 > URL: https://issues.apache.org/jira/browse/SPARK-22611 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.2.0 >Reporter: Richard Moorhead > > Ive loaded a Kinesis stream with a single shard with ~20M records and have > created a simple spark streaming application that writes those records to s3. > When the streaming interval is set sufficiently wide such that 2MB/s read > rates are violated, the receiver's KCL processes throw > ProvisionedThroughputExceededExceptions. While these exceptions are expected, > the output record counts in s3 do not match the record counts in the Spark > Streaming UI and worse, the records never appear to be fetched in future > batches. This problem can be mitigated by setting the streaming interval to a > narrow window such that batches are small enough that throughput limits arent > exceeded but this isnt guaranteed in a production system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22610) AM restart in a other node send Jobs into a state of feign death
[ https://issues.apache.org/jira/browse/SPARK-22610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22610. --- Resolution: Invalid Questions to the mailing list please > AM restart in a other node send Jobs into a state of feign death > > > Key: SPARK-22610 > URL: https://issues.apache.org/jira/browse/SPARK-22610 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 >Reporter: Bang Xiao >Priority: Minor > > I run "spark-sql --master yarn --deploy-mode client -f 'SQLs' " in shell, > The application is stuck when the AM is down and restart in other nodes. It > seems the driver wait for the next sql. Is this a bug?In my opinion,Either > the application execute the failed sql or exit with a failure when the AM > restart。 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22611) Spark Kinesis ProvisionedThroughputExceededException leads to dropped records
Richard Moorhead created SPARK-22611: Summary: Spark Kinesis ProvisionedThroughputExceededException leads to dropped records Key: SPARK-22611 URL: https://issues.apache.org/jira/browse/SPARK-22611 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.2.0 Reporter: Richard Moorhead Ive loaded a Kinesis stream with a single shard with ~20M records and have created a simple spark streaming application that writes those records to s3. When the streaming interval is set sufficiently wide such that 2MB/s read rates are violated, the receiver's KCL processes throw ProvisionedThroughputExceededExceptions. While these exceptions are expected, the output record counts in s3 do not match the record counts in the Spark Streaming UI and worse, the records never appear to be fetched in future batches. This problem can be mitigated by setting the streaming interval to a narrow window such that batches are small enough that throughput limits arent exceeded but this isnt guaranteed in a production system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22610) AM restart in a other node send Jobs into a state of feign death
Bang Xiao created SPARK-22610: - Summary: AM restart in a other node send Jobs into a state of feign death Key: SPARK-22610 URL: https://issues.apache.org/jira/browse/SPARK-22610 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.1 Reporter: Bang Xiao Priority: Minor I run "spark-sql --master yarn --deploy-mode client -f 'SQLs' " in shell, The application is stuck when the AM is down and restart in other nodes. It seems the driver wait for the next sql. Is this a bug?In my opinion,Either the application execute the failed sql or exit with a failure when the AM restart。 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
[ https://issues.apache.org/jira/browse/SPARK-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido closed SPARK-22609. --- > Reuse CodeGeneration.nullSafeExec when possible > --- > > Key: SPARK-22609 > URL: https://issues.apache.org/jira/browse/SPARK-22609 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Priority: Trivial > > There are several places in the code where `CodeGeneration.nullSafeExec` > could be used, but it is not. This makes the generated code containing a lot > of useless: > {code} > if (!false) { > // some code here > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
[ https://issues.apache.org/jira/browse/SPARK-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-22609. - Resolution: Invalid > Reuse CodeGeneration.nullSafeExec when possible > --- > > Key: SPARK-22609 > URL: https://issues.apache.org/jira/browse/SPARK-22609 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Priority: Trivial > > There are several places in the code where `CodeGeneration.nullSafeExec` > could be used, but it is not. This makes the generated code containing a lot > of useless: > {code} > if (!false) { > // some code here > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22601: Assignee: (was: Apache Spark) > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22601: Assignee: Apache Spark > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Assignee: Apache Spark >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266123#comment-16266123 ] Apache Spark commented on SPARK-22601: -- User 'sujith71955' has created a pull request for this issue: https://github.com/apache/spark/pull/19823 > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22579) BlockManager.getRemoteValues and BlockManager.getRemoteBytes should be implemented using streaming
[ https://issues.apache.org/jira/browse/SPARK-22579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266102#comment-16266102 ] Eyal Farago commented on SPARK-22579: - [~srowen], I couldn't see a place where this data is stored for later use, even in the 'big fetch' scenario the file isn't reused as org.apache.spark.storage.BlockManager#remoteBlockTempFileManager is only used to stream big file - once, it's never used to figure out if the download can be avoided. furthermore this blocks are also not registered with the block manager, after all they can be used as 'DISK_ONLY' cache on the requesting executor and potentially even by other executors residing on the same node. [~jerryshao], it seems the 'big files' code path actually uses streaming, makes me think it can be slightly modified to actually accept a stream handler (current behavior can be achieved by passing in a download stream handler). one thing that does have the potential for troubles is error recovery when more than one executor can serve the block, current impl simply moves to the next one, a streaming based approach would have to request the rest of the block from an other executor (assuming blocks are identical). > BlockManager.getRemoteValues and BlockManager.getRemoteBytes should be > implemented using streaming > -- > > Key: SPARK-22579 > URL: https://issues.apache.org/jira/browse/SPARK-22579 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Spark Core >Affects Versions: 2.1.0 >Reporter: Eyal Farago > > when an RDD partition is cached on an executor bu the task requiring it is > running on another executor (process locality ANY), the cached partition is > fetched via BlockManager.getRemoteValues which delegates to > BlockManager.getRemoteBytes, both calls are blocking. > in my use case I had a 700GB RDD spread over 1000 partitions on a 6 nodes > cluster, cached to disk. rough math shows that average partition size is > 700MB. > looking at spark UI it was obvious that tasks running with process locality > 'ANY' are much slower than local tasks (~40 seconds to 8-10 minutes ratio), I > was able to capture thread dumps of executors executing remote tasks and got > this stake trace: > {quote}Thread ID Thread Name Thread StateThread Locks > 1521 Executor task launch worker-1000WAITING > Lock(java.util.concurrent.ThreadPoolExecutor$Worker@196462978}) > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > scala.concurrent.Await$.result(package.scala:190) > org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190) > org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104) > org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:582) > org.apache.spark.storage.BlockManager.getRemoteValues(BlockManager.scala:550) > org.apache.spark.storage.BlockManager.get(BlockManager.scala:638) > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:690) > org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334) > org.apache.spark.rdd.RDD.iterator(RDD.scala:285) > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > org.apache.spark.rdd.RDD.iterator(RDD.scala:287){quote} > digging into the code showed that the block manager first fetches all bytes > (getRemoteBytes) and then wraps it with a deserialization stream, this has > several draw backs: > 1. blocking, requesting executor is blocked while the remote executor is > serving the block. > 2. potentially large memory footprint on requesting executor, in my use case > a 700mb of raw bytes stored in a ChunkedByteBuffer. > 3. inefficient, requesting side usually don't need
[jira] [Updated] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22608: - Issue Type: Improvement (was: Bug) > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22608: - Priority: Minor (was: Major) > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22573) SQL Planner is including unnecessary columns in the projection
[ https://issues.apache.org/jira/browse/SPARK-22573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-22573. - Resolution: Duplicate > SQL Planner is including unnecessary columns in the projection > -- > > Key: SPARK-22573 > URL: https://issues.apache.org/jira/browse/SPARK-22573 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Rajkishore Hembram > > While I was running TPC-H query 18 for benchmarking, I observed that the > query plan for Apache Spark 2.2.0 is inefficient than other versions of > Apache Spark. I noticed that the other versions of Apache Spark (2.0.2 and > 2.1.2) are only including the required columns in the projections. But the > query planner of Apache Spark 2.2.0 is including unnecessary columns into the > projection for some of the queries and hence unnecessarily increasing the > I/O. And because of that the Apache Spark 2.2.0 is taking more time. > [Spark 2.1.2 TPC-H Query 18 > Plan|https://drive.google.com/file/d/1_u8nPKG_SIM7P6fs0VK-8UEXIhWPY_BN/view] > [Spark 2.2.0 TPC-H Query 18 > Plan|https://drive.google.com/file/d/1xtxG5Ext36djfTDSdf_W5vGbbdgRApPo/view] > TPC-H Query 18 > {code:java} > select C_NAME,C_CUSTKEY,O_ORDERKEY,O_ORDERDATE,O_TOTALPRICE,sum(L_QUANTITY) > from CUSTOMER,ORDERS,LINEITEM where O_ORDERKEY in ( select L_ORDERKEY from > LINEITEM group by L_ORDERKEY having sum(L_QUANTITY) > 300 ) and C_CUSTKEY = > O_CUSTKEY and O_ORDERKEY = L_ORDERKEY group by > C_NAME,C_CUSTKEY,O_ORDERKEY,O_ORDERDATE,O_TOTALPRICE order by O_TOTALPRICE > desc,O_ORDERDATE > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22607) Set large stack size consistently for tests to avoid StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22607. --- Resolution: Fixed Fix Version/s: 2.3.0 2.2.2 Issue resolved by pull request 19820 [https://github.com/apache/spark/pull/19820] > Set large stack size consistently for tests to avoid StackOverflowError > --- > > Key: SPARK-22607 > URL: https://issues.apache.org/jira/browse/SPARK-22607 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.2.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 2.2.2, 2.3.0 > > > I was seeing this error while testing the 2.2.1 RC: > {code} > OrderingSuite: > ... > - GenerateOrdering with ShortType > *** RUN ABORTED *** > java.lang.StackOverflowError: > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > ... > {code} > This doesn't seem to happen on Jenkins, for whatever reason. It seems like we > set JVM flags for tests inconsistently, and in particular, only set a 4MB > stack size for surefire, not scalatest-maven-plugin. Adding {{-Xss4m}} made > the test pass for me. > We can also make sure that all of these pass {{-ea}} consistently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22607) Set large stack size consistently for tests to avoid StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-22607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-22607: -- Description: I was seeing this error while testing the 2.2.1 RC: {code} OrderingSuite: ... - GenerateOrdering with ShortType *** RUN ABORTED *** java.lang.StackOverflowError: at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) ... {code} This doesn't seem to happen on Jenkins, for whatever reason. It seems like we set JVM flags for tests inconsistently, and in particular, only set a 4MB stack size for surefire, not scalatest-maven-plugin. Adding {{-Xss4m}} made the test pass for me. We can also make sure that all of these pass {{-ea}} consistently. was: I was seeing this error while testing the 2.2.1 RC: {code} java.lang.StackOverflowError: at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) {code} This doesn't seem to happen on Jenkins, for whatever reason. It seems like we set JVM flags for tests inconsistently, and in particular, only set a 4MB stack size for surefire, not scalatest-maven-plugin. Adding {{-Xss4m}} made the test pass for me. We can also make sure that all of these pass {{-ea}} consistently. > Set large stack size consistently for tests to avoid StackOverflowError > --- > > Key: SPARK-22607 > URL: https://issues.apache.org/jira/browse/SPARK-22607 > Project: Spark > Issue Type: Bug > Components: Build, Tests >Affects Versions: 2.2.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > I was seeing this error while testing the 2.2.1 RC: > {code} > OrderingSuite: > ... > - GenerateOrdering with ShortType > *** RUN ABORTED *** > java.lang.StackOverflowError: > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541) > ... > {code} > This doesn't seem to happen on Jenkins, for whatever reason. It seems like we > set JVM flags for tests inconsistently, and in particular, only set a 4MB > stack size for surefire, not scalatest-maven-plugin. Adding {{-Xss4m}} made > the test pass for me. > We can also make sure that all of these pass {{-ea}} consistently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
[ https://issues.apache.org/jira/browse/SPARK-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22609: Assignee: (was: Apache Spark) > Reuse CodeGeneration.nullSafeExec when possible > --- > > Key: SPARK-22609 > URL: https://issues.apache.org/jira/browse/SPARK-22609 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Priority: Trivial > > There are several places in the code where `CodeGeneration.nullSafeExec` > could be used, but it is not. This makes the generated code containing a lot > of useless: > {code} > if (!false) { > // some code here > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
[ https://issues.apache.org/jira/browse/SPARK-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266002#comment-16266002 ] Apache Spark commented on SPARK-22609: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/19822 > Reuse CodeGeneration.nullSafeExec when possible > --- > > Key: SPARK-22609 > URL: https://issues.apache.org/jira/browse/SPARK-22609 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Priority: Trivial > > There are several places in the code where `CodeGeneration.nullSafeExec` > could be used, but it is not. This makes the generated code containing a lot > of useless: > {code} > if (!false) { > // some code here > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
[ https://issues.apache.org/jira/browse/SPARK-22609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22609: Assignee: Apache Spark > Reuse CodeGeneration.nullSafeExec when possible > --- > > Key: SPARK-22609 > URL: https://issues.apache.org/jira/browse/SPARK-22609 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Marco Gaido >Assignee: Apache Spark >Priority: Trivial > > There are several places in the code where `CodeGeneration.nullSafeExec` > could be used, but it is not. This makes the generated code containing a lot > of useless: > {code} > if (!false) { > // some code here > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22609) Reuse CodeGeneration.nullSafeExec when possible
Marco Gaido created SPARK-22609: --- Summary: Reuse CodeGeneration.nullSafeExec when possible Key: SPARK-22609 URL: https://issues.apache.org/jira/browse/SPARK-22609 Project: Spark Issue Type: Task Components: SQL Affects Versions: 2.3.0 Reporter: Marco Gaido Priority: Trivial There are several places in the code where `CodeGeneration.nullSafeExec` could be used, but it is not. This makes the generated code containing a lot of useless: {code} if (!false) { // some code here } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265993#comment-16265993 ] ABHISHEK CHOUDHARY commented on SPARK-18016: I found the same issue in Latest spark 2.2.0 while using with pyspark. Number of columns I am expecting is more than 50K , do you think, the patch will fix that kind of huge number as well ? > Code Generation: Constant Pool Past Limit for Wide/Nested Dataset > - > > Key: SPARK-18016 > URL: https://issues.apache.org/jira/browse/SPARK-18016 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Aleksander Eskilson >Assignee: Aleksander Eskilson > Fix For: 2.3.0 > > > When attempting to encode collections of large Java objects to Datasets > having very wide or deeply nested schemas, code generation can fail, yielding: > {code} > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for > class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection > has grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499) > at > org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439) > at > org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358) > at > org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894) > at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420) > at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$AbstractPackageMemberClassDeclaration.accept(Java.java:1309) > at
[jira] [Assigned] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22608: Assignee: (was: Apache Spark) > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265952#comment-16265952 ] Apache Spark commented on SPARK-22608: -- User 'kiszk' has created a pull request for this issue: https://github.com/apache/spark/pull/19821 > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22608: Assignee: Apache Spark > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Assignee: Apache Spark > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22608) Avoid code duplication regarding CodeGeneration.splitExpressions()
[ https://issues.apache.org/jira/browse/SPARK-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22608: - Summary: Avoid code duplication regarding CodeGeneration.splitExpressions() (was: Add new API to CodeGeneration.splitExpressions()) > Avoid code duplication regarding CodeGeneration.splitExpressions() > -- > > Key: SPARK-22608 > URL: https://issues.apache.org/jira/browse/SPARK-22608 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > > Since several {{CodeGenenerator.splitExpression}} are used with > {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code > duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22608) Add new API to CodeGeneration.splitExpressions()
Kazuaki Ishizaki created SPARK-22608: Summary: Add new API to CodeGeneration.splitExpressions() Key: SPARK-22608 URL: https://issues.apache.org/jira/browse/SPARK-22608 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Kazuaki Ishizaki Since several {{CodeGenenerator.splitExpression}} are used with {{ctx.INPUT_ROW}}, it would be good to prepare APIs for this to avoid code duplication. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org