[jira] [Commented] (SPARK-25960) Support subpath mounting with Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-25960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745347#comment-16745347 ] Luciano Resende commented on SPARK-25960: - [~vanzin] Any chance this can be backported to 2.4.x ? Is it a matter of providing a pr to the 2.4 branch ? > Support subpath mounting with Kubernetes > > > Key: SPARK-25960 > URL: https://issues.apache.org/jira/browse/SPARK-25960 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Timothy Chen >Assignee: Nihar Sheth >Priority: Major > Fix For: 3.0.0 > > > Currently we support mounting volumes into executor and driver, but there is > no option to provide a subpath to be mounted from the volume. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22865) Publish Official Apache Spark Docker images
[ https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689695#comment-16689695 ] Luciano Resende commented on SPARK-22865: - Any update on this issue? [~akorzhuev] seems to have things automated and maybe we should enhance it and make it part of the release process? If folks are ok I might take a quick look into it. > Publish Official Apache Spark Docker images > --- > > Key: SPARK-22865 > URL: https://issues.apache.org/jira/browse/SPARK-22865 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22865) Publish Official Apache Spark Docker images
[ https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689695#comment-16689695 ] Luciano Resende edited comment on SPARK-22865 at 11/16/18 5:23 PM: --- Any update on this issue? [~akorzhuev] seems to have things automated and maybe we should enhance it and make it part of the release process? If folks are ok I might take a quick look into it as I need these images for a related project. was (Author: luciano resende): Any update on this issue? [~akorzhuev] seems to have things automated and maybe we should enhance it and make it part of the release process? If folks are ok I might take a quick look into it. > Publish Official Apache Spark Docker images > --- > > Key: SPARK-22865 > URL: https://issues.apache.org/jira/browse/SPARK-22865 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Anirudh Ramanathan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24679) Download page should not link to unreleased code
Luciano Resende created SPARK-24679: --- Summary: Download page should not link to unreleased code Key: SPARK-24679 URL: https://issues.apache.org/jira/browse/SPARK-24679 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.3.1 Reporter: Luciano Resende The download pages currently link to the git code repository. Whilst the instructions show how to check out master or a particular release branch, this also gives access to the rest of the repo, i.e. to non-released code. Links to code repos should only be published on pages intended for developers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13736) Big-Endian plataform issues
[ https://issues.apache.org/jira/browse/SPARK-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende resolved SPARK-13736. - Resolution: Invalid We are using the big-endian label to track platform related issues. > Big-Endian plataform issues > --- > > Key: SPARK-13736 > URL: https://issues.apache.org/jira/browse/SPARK-13736 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 1.6.0 >Reporter: Luciano Resende >Priority: Critical > > We are starting to see few issues when building/testing on Big-Endian > platform. This serves as an umbrella jira to group all platform specific > issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20984) Reading back from ORC format gives error on big endian systems.
[ https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-20984: Labels: big-endian (was: ) > Reading back from ORC format gives error on big endian systems. > --- > > Key: SPARK-20984 > URL: https://issues.apache.org/jira/browse/SPARK-20984 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 > Environment: Redhat 7 on power 7 Big endian platform. > [testuser@soe10-vm12 spark]$ cat /etc/redhat- > redhat-access-insights/ redhat-release > [testuser@soe10-vm12 spark]$ cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.2 (Maipo) > [testuser@soe10-vm12 spark]$ lscpu > Architecture: ppc64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Big Endian > CPU(s):8 > On-line CPU(s) list: 0-7 > Thread(s) per core:1 > Core(s) per socket:1 > Socket(s): 8 > NUMA node(s): 1 > Model: IBM pSeries (emulated by qemu) > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-7 > [testuser@soe10-vm12 spark]$ >Reporter: Mahesh > Labels: big-endian > > All orc test cases seem to be failing here. Looks like spark is not able to > read back what is written. Following is a way to check it on spark shell. I > am also pasting the test case which probably passes on x86. > All test cases in OrcHadoopFsRelationSuite.scala are failing. > test("SPARK-12218: 'Not' is included in ORC filter pushdown") { > import testImplicits._ > withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") { > withTempPath { dir => > val path = s"${dir.getCanonicalPath}/table1" > (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc(path) > checkAnswer( > spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > checkAnswer( > spark.read.orc(path).where("not (a = 2 and b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > } > } > } > Same can be reproduced on spark shell > **Create a DF and write it in orc > scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc("test") > **Now try to read it back > scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show > 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting > at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) > at >
[jira] [Reopened] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende reopened SPARK-5159: Reopened due to comments above > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > Attachments: spark_thrift_server_log.txt > > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19249) Update Download page to describe how to download archived releases
Luciano Resende created SPARK-19249: --- Summary: Update Download page to describe how to download archived releases Key: SPARK-19249 URL: https://issues.apache.org/jira/browse/SPARK-19249 Project: Spark Issue Type: Bug Components: Documentation Reporter: Luciano Resende Priority: Minor Very often users come to the mailing list to ask for where they could download old Spark releases. We should document the archive location on the Download page. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17418) Spark release must NOT distribute Kinesis related assembly artifact
[ https://issues.apache.org/jira/browse/SPARK-17418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507935#comment-15507935 ] Luciano Resende commented on SPARK-17418: - Thanks [~fielding]. It seems we are ok to publish the code that depend on the Amazon licensed library and [~joshrosen] PullRequest takes care of the other aspect which is removing the assembly jar (only) publishing. > Spark release must NOT distribute Kinesis related assembly artifact > --- > > Key: SPARK-17418 > URL: https://issues.apache.org/jira/browse/SPARK-17418 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Luciano Resende >Priority: Blocker > > The Kinesis streaming connector is based on the Amazon Software License, and > based on the Apache Legal resolved issues > (http://www.apache.org/legal/resolved.html#category-x) it's not allowed to be > distributed by Apache projects. > More details is available in LEGAL-198 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17422) Update Ganglia project with new license
[ https://issues.apache.org/jira/browse/SPARK-17422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-17422: Description: It seems that Ganglia is now BSD licensed http://ganglia.info/ and https://sourceforge.net/p/ganglia/code/1397/ And the library we depend on for the ganglia connector has been moved to BSD license as well. We should update the ganglia spark project to leverage these license changes, and enable it on Spark. was: It seems that Ganglia is now BSD licensed http://ganglia.info/ and https://sourceforge.net/p/ganglia/code/1397/ And the library we depend on for the ganglia connector has been moved to BSD license as well. We should update the ganglia spark connector/dependencies to leverage these license changes, and enable it on Spark. > Update Ganglia project with new license > --- > > Key: SPARK-17422 > URL: https://issues.apache.org/jira/browse/SPARK-17422 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Luciano Resende > > It seems that Ganglia is now BSD licensed > http://ganglia.info/ and https://sourceforge.net/p/ganglia/code/1397/ > And the library we depend on for the ganglia connector has been moved to BSD > license as well. > We should update the ganglia spark project to leverage these license changes, > and enable it on Spark. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17422) Update Ganglia project with new license
Luciano Resende created SPARK-17422: --- Summary: Update Ganglia project with new license Key: SPARK-17422 URL: https://issues.apache.org/jira/browse/SPARK-17422 Project: Spark Issue Type: Bug Components: Build, Streaming Affects Versions: 2.0.0, 1.6.2 Reporter: Luciano Resende It seems that Ganglia is now BSD licensed http://ganglia.info/ and https://sourceforge.net/p/ganglia/code/1397/ And the library we depend on for the ganglia connector has been moved to BSD license as well. We should update the ganglia spark connector/dependencies to leverage these license changes, and enable it on Spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17418) Spark release must NOT distribute Kinesis related artifacts
[ https://issues.apache.org/jira/browse/SPARK-17418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468348#comment-15468348 ] Luciano Resende commented on SPARK-17418: - I am going to create a PR for this, basically removing from the release publish process. Will have to see what are the side effects on the overall build as there is embedded support for it over python and samples (which seems to convey that this is not really an optional package) > Spark release must NOT distribute Kinesis related artifacts > --- > > Key: SPARK-17418 > URL: https://issues.apache.org/jira/browse/SPARK-17418 > Project: Spark > Issue Type: Bug > Components: Build, Streaming >Affects Versions: 1.6.2, 2.0.0 >Reporter: Luciano Resende >Priority: Critical > > The Kinesis streaming connector is based on the Amazon Software License, and > based on the Apache Legal resolved issues > (http://www.apache.org/legal/resolved.html#category-x) it's not allowed to be > distributed by Apache projects. > More details is available in LEGAL-198 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17418) Spark release must NOT distribute Kinesis related artifacts
Luciano Resende created SPARK-17418: --- Summary: Spark release must NOT distribute Kinesis related artifacts Key: SPARK-17418 URL: https://issues.apache.org/jira/browse/SPARK-17418 Project: Spark Issue Type: Bug Components: Build, Streaming Affects Versions: 2.0.0, 1.6.2 Reporter: Luciano Resende Priority: Critical The Kinesis streaming connector is based on the Amazon Software License, and based on the Apache Legal resolved issues (http://www.apache.org/legal/resolved.html#category-x) it's not allowed to be distributed by Apache projects. More details is available in LEGAL-198 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17392) Refactor rat build on Travis.CI PR build to avoid timeouts
[ https://issues.apache.org/jira/browse/SPARK-17392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15462244#comment-15462244 ] Luciano Resende commented on SPARK-17392: - After these changes I finally started getting more green builds on Travis.CI > Refactor rat build on Travis.CI PR build to avoid timeouts > -- > > Key: SPARK-17392 > URL: https://issues.apache.org/jira/browse/SPARK-17392 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Luciano Resende > > Some PR builds are failing with timeout when executing RAT checks, and these > checks are running on each of the individual builds. > >> > No output has been received in the last 10 minutes, this potentially > indicates a stalled build or something wrong with the build itself. > The build has been terminated > >> > Moving RAT to it's own profile and running only once per PR build seems to > alleviate these time outs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17392) Refactor rat build on Travis.CI PR build to avoid timeouts
Luciano Resende created SPARK-17392: --- Summary: Refactor rat build on Travis.CI PR build to avoid timeouts Key: SPARK-17392 URL: https://issues.apache.org/jira/browse/SPARK-17392 Project: Spark Issue Type: Bug Components: Build Reporter: Luciano Resende Some PR builds are failing with timeout when executing RAT checks, and these checks are running on each of the individual builds. >> No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself. The build has been terminated >> Moving RAT to it's own profile and running only once per PR build seems to alleviate these time outs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17023) Update Kafka connetor to use Kafka 0.10.0.1
Luciano Resende created SPARK-17023: --- Summary: Update Kafka connetor to use Kafka 0.10.0.1 Key: SPARK-17023 URL: https://issues.apache.org/jira/browse/SPARK-17023 Project: Spark Issue Type: Improvement Components: Build Reporter: Luciano Resende Priority: Minor Update Kafka connector to use latest version of Kafka dependencies (0.10.0.1) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13979) Killed executor is respawned without AWS keys in standalone spark cluster
[ https://issues.apache.org/jira/browse/SPARK-13979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15390831#comment-15390831 ] Luciano Resende commented on SPARK-13979: - This scenario usually happens when you pass specific configurations via HadoopConfiguration when creating the Spark context : val conf = new SparkConf().setAppName("test").setMaster("local") sc = new SparkContext(conf) sc.hadoopConfiguration.set("key1", "value1") Then, across the Spark code, sometimes we seem to be doing the right thing like in SessionState.scala def newHadoopConf(): Configuration = { val hadoopConf = new Configuration(sparkSession.sparkContext.hadoopConfiguration) conf.getAllConfs.foreach { case (k, v) => if (v ne null) hadoopConf.set(k, v) } hadoopConf } But in other places, we seem to be ignoring the provided HadoopConfiguration, like when DataSourceStratgy.scala, where we call SparkHadoopUtil.get.conf and it's implementation creates an empty hadoop configuration def newConfiguration(conf: SparkConf): Configuration = { val hadoopConf = new Configuration() appendS3AndSparkHadoopConfigurations(conf, hadoopConf) hadoopConf } Note that, in this case, S3 might still work, when AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are available as environment variables, as there is some magic to update hadoopConf with the proper information based on these values. But if you are providing other configurations in hadoopConfig programmatically when creating the spark context then this will be broken. I am trying to investigate it further to see if I can find a more centralized place to workaround/fix and always honor any programmatically provided hadoop configuration. > Killed executor is respawned without AWS keys in standalone spark cluster > - > > Key: SPARK-13979 > URL: https://issues.apache.org/jira/browse/SPARK-13979 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: I'm using Spark 1.5.2 with Hadoop 2.7 and running > experiments on a simple standalone cluster: > 1 master > 2 workers > All ubuntu 14.04 with Java 8/Scala 2.10 >Reporter: Allen George > > I'm having a problem where respawning a failed executor during a job that > reads/writes parquet on S3 causes subsequent tasks to fail because of missing > AWS keys. > h4. Setup: > I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple > standalone cluster: > 1 master > 2 workers > My application is co-located on the master machine, while the two workers are > on two other machines (one worker per machine). All machines are running in > EC2. I've configured my setup so that my application executes its task on two > executors (one executor per worker). > h4. Application: > My application reads and writes parquet files on S3. I set the AWS keys on > the SparkContext by doing: > val sc = new SparkContext() > val hadoopConf = sc.hadoopConfiguration > hadoopConf.set("fs.s3n.awsAccessKeyId", "SOME_KEY") > hadoopConf.set("fs.s3n.awsSecretAccessKey", "SOME_SECRET") > At this point I'm done, and I go ahead and use "sc". > h4. Issue: > I can read and write parquet files without a problem with this setup. *BUT* > if an executor dies during a job and is respawned by a worker, tasks fail > with the following error: > "Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret > Access Key must be specified as the username or password (respectively) of a > s3n URL, or by setting the {{fs.s3n.awsAccessKeyId}} or > {{fs.s3n.awsSecretAccessKey}} properties (respectively)." > h4. Basic analysis > I think I've traced this down to the following: > SparkHadoopUtil is initialized with an empty {{SparkConf}}. Later, classes > like {{DataSourceStrategy}} simply call {{SparkHadoopUtil.get.conf}} and > access the (now invalid; missing various properties) {{HadoopConfiguration}} > that's built from this empty {{SparkConf}} object. It's unclear to me why > this is done, and it seems that the code as written would cause broken > results anytime callers use {{SparkHadoopUtil.get.conf}} directly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16422) maven 3.3.3 missing from mirror, breaks older builds
[ https://issues.apache.org/jira/browse/SPARK-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366875#comment-15366875 ] Luciano Resende edited comment on SPARK-16422 at 7/7/16 10:25 PM: -- When new releases become available, old ones get moved to archive (e.g. https://archive.apache.org/dist/maven/maven-3/...) If we use the mirrors, we will always have this issue when new maven releases become available, if we use the archive we won't have this issue but we would be putting a little more load on the Apache Infrastructure. Anyway, if you guys think it's worth moving to use the archive release location I could provide a patch. was (Author: luciano resende): When new builds are released, old ones get moved to archive (e.g. https://archive.apache.org/dist/maven/maven-3/...) If we use the mirrors, we will always have this issue when new maven releases become available, if we use the archive we won't have this issue but we would be putting a little more load on the Apache Infrastructure. Anyway, if you guys think it's worth moving to use the archive release location I could provide a patch. > maven 3.3.3 missing from mirror, breaks older builds > > > Key: SPARK-16422 > URL: https://issues.apache.org/jira/browse/SPARK-16422 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Critical > Fix For: 1.6.3 > > > Trying to build spark 1.6.2 but it fails because the maven 3.3.3 is gone. > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz > Is this something that we control. I saw the latest 2.0 builds were updating > to 3.3.9 and that exists there. > Various mirrors and atleast all the ones I've hit are missing it: > http://mirrors.koehn.com/apache//maven/maven-3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16422) maven 3.3.3 missing from mirror, breaks older builds
[ https://issues.apache.org/jira/browse/SPARK-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366875#comment-15366875 ] Luciano Resende commented on SPARK-16422: - When new builds are released, old ones get moved to archive (e.g. https://archive.apache.org/dist/maven/maven-3/...) If we use the mirrors, we will always have this issue when new maven releases become available, if we use the archive we won't have this issue but we would be putting a little more load on the Apache Infrastructure. Anyway, if you guys think it's worth moving to use the archive release location I could provide a patch. > maven 3.3.3 missing from mirror, breaks older builds > > > Key: SPARK-16422 > URL: https://issues.apache.org/jira/browse/SPARK-16422 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.6.2 >Reporter: Thomas Graves >Priority: Critical > Fix For: 1.6.3 > > > Trying to build spark 1.6.2 but it fails because the maven 3.3.3 is gone. > https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz > Is this something that we control. I saw the latest 2.0 builds were updating > to 3.3.9 and that exists there. > Various mirrors and atleast all the ones I've hit are missing it: > http://mirrors.koehn.com/apache//maven/maven-3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15370) Some correlated subqueries return incorrect answers
[ https://issues.apache.org/jira/browse/SPARK-15370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327683#comment-15327683 ] Luciano Resende commented on SPARK-15370: - [~hvanhovell] You might need to add [~freiss] to contributor group in Spark jira admin console in order to assign the ticket to Fred. If you don't have access to it, maybe [~rxin] might be able to help sort this out. > Some correlated subqueries return incorrect answers > --- > > Key: SPARK-15370 > URL: https://issues.apache.org/jira/browse/SPARK-15370 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Frederick Reiss > > The rewrite introduced in SPARK-14785 has the COUNT bug. The rewrite changes > the semantics of some correlated subqueries when there are tuples from the > outer query block that do not join with the subquery. For example: > {noformat} > spark-sql> create table R(a integer) as values (1); > spark-sql> create table S(b integer); > spark-sql> select R.a from R > > where (select count(*) from S where R.a = S.b) = 0; > Time taken: 2.139 seconds > > spark-sql> > (returns zero rows; the answer should be one row of '1') > {noformat} > This problem also affects the SELECT clause: > {noformat} > spark-sql> select R.a, > > (select count(*) from S where R.a = S.b) as cnt > > from R; > 1 NULL > (the answer should be "1 0") > {noformat} > Some subqueries with COUNT aggregates are *not* affected: > {noformat} > spark-sql> select R.a from R > > where (select count(*) from S where R.a = S.b) > 0; > Time taken: 0.609 seconds > spark-sql> > (Correct answer) > spark-sql> select R.a from R > > where (select count(*) + sum(S.b) from S where R.a = S.b) = 0; > Time taken: 0.553 seconds > spark-sql> > (Correct answer) > {noformat} > Other cases can trigger the variant of the COUNT bug for expressions > involving NULL checks: > {noformat} > spark-sql> select R.a from R > > where (select sum(S.b) is null from S where R.a = S.b); > (returns zero rows, should return one row) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324652#comment-15324652 ] Luciano Resende commented on SPARK-12177: - At Apache Bahir we are working on a release based on Apache Spark 2.0 preview : https://www.mail-archive.com/dev@bahir.apache.org/msg00038.html and we have an outstanding issue to create a Kafka 0.10 connector https://issues.apache.org/jira/browse/BAHIR-9. We would welcome your contributions if the Spark community decides to wait on this issue. > Update KafkaDStreams to new Kafka 0.10 Consumer API > --- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5682) Add encrypted shuffle in spark
[ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-5682: --- Labels: stc (was: ) > Add encrypted shuffle in spark > -- > > Key: SPARK-5682 > URL: https://issues.apache.org/jira/browse/SPARK-5682 > Project: Spark > Issue Type: New Feature > Components: Shuffle >Reporter: liyunzhang_intel > Labels: stc > Attachments: Design Document of Encrypted Spark > Shuffle_20150209.docx, Design Document of Encrypted Spark > Shuffle_20150318.docx, Design Document of Encrypted Spark > Shuffle_20150402.docx, Design Document of Encrypted Spark > Shuffle_20150506.docx > > > Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle > data safer. This feature is necessary in spark. AES is a specification for > the encryption of electronic data. There are 5 common modes in AES. CTR is > one of the modes. We use two codec JceAesCtrCryptoCodec and > OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used > in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms jdk > provides while OpensslAesCtrCryptoCodec uses encrypted algorithms openssl > provides. > Because ugi credential info is used in the process of encrypted shuffle, we > first enable encrypted shuffle on spark-on-yarn framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6373) Add SSL/TLS for the Netty based BlockTransferService
[ https://issues.apache.org/jira/browse/SPARK-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-6373: --- Labels: stc (was: ) > Add SSL/TLS for the Netty based BlockTransferService > - > > Key: SPARK-6373 > URL: https://issues.apache.org/jira/browse/SPARK-6373 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Shuffle >Affects Versions: 1.2.1 >Reporter: Jeffrey Turpin > Labels: stc > > Add the ability to allow for secure communications (SSL/TLS) for the Netty > based BlockTransferService and the ExternalShuffleClient. This ticket will > hopefully start the conversation around potential designs... Below is a > reference to a WIP prototype which implements this functionality > (prototype)... I have attempted to disrupt as little code as possible and > tried to follow the current code structure (for the most part) in the areas I > modified. I also studied how Hadoop achieves encrypted shuffle > (http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html) > https://github.com/turp1twin/spark/commit/024b559f27945eb63068d1badf7f82e4e7c3621c -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15451) Spark PR builder should fail if code doesn't compile against JDK 7
[ https://issues.apache.org/jira/browse/SPARK-15451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293951#comment-15293951 ] Luciano Resende commented on SPARK-15451: - Should we actually have some sort of schedule builds (e.g. nightly builds) that can be used for these purposes (e.g. confirm JDK 7 still works, docker integration tests, etc) ? Otherwise, if we build each PR with both JDK 7 and JDK 8 it might be troublesome in terms of time. > Spark PR builder should fail if code doesn't compile against JDK 7 > -- > > Key: SPARK-15451 > URL: https://issues.apache.org/jira/browse/SPARK-15451 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin > > We need to compile certain parts of the build using jdk8, so that we test > things like lambdas. But when possible, we should either compile using jdk7, > or provide jdk7's rt.jar to javac. Otherwise it's way too easy to slip in > jdk8-specific library calls. > I'll take a look at fixing the maven / sbt files, but I'm not sure how to > update the PR builders since this will most probably require at least a new > env variable (to say where jdk7 is). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15309) Bump master to version 2.1.0-SNAPSHOT
Luciano Resende created SPARK-15309: --- Summary: Bump master to version 2.1.0-SNAPSHOT Key: SPARK-15309 URL: https://issues.apache.org/jira/browse/SPARK-15309 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende Now that 2.0 branch has been created, master version should be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12660) Rewrite except using anti-join
[ https://issues.apache.org/jira/browse/SPARK-12660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258705#comment-15258705 ] Luciano Resende commented on SPARK-12660: - Please disregard my last pr > Rewrite except using anti-join > -- > > Key: SPARK-12660 > URL: https://issues.apache.org/jira/browse/SPARK-12660 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin > > Similar to SPARK-12656, we can rewrite except in the logical level using > anti-join. This way, we can take advantage of all the benefits of join > implementations (e.g. managed memory, code generation, broadcast joins). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14738) Separate Docker Integration Tests from main spark build
Luciano Resende created SPARK-14738: --- Summary: Separate Docker Integration Tests from main spark build Key: SPARK-14738 URL: https://issues.apache.org/jira/browse/SPARK-14738 Project: Spark Issue Type: Bug Components: Build, SQL Reporter: Luciano Resende Currently docker integration tests are run as part of the main build, but it requires dev machines to have all the required docker installation setup which most of cases is not available and thus the tests will fail. this would separate the tests from the main spark build, and make them optional which then could be invoked manually or as part of CI tests -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14590) Update pull request template with link to jira
[ https://issues.apache.org/jira/browse/SPARK-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende closed SPARK-14590. --- Resolution: Won't Fix > Update pull request template with link to jira > -- > > Key: SPARK-14590 > URL: https://issues.apache.org/jira/browse/SPARK-14590 > Project: Spark > Issue Type: Improvement >Reporter: Luciano Resende >Priority: Minor > > Update pull request template to have a link to the current jira issue to > facilitate navigation between the two. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14590) Update pull request template with link to jira
Luciano Resende created SPARK-14590: --- Summary: Update pull request template with link to jira Key: SPARK-14590 URL: https://issues.apache.org/jira/browse/SPARK-14590 Project: Spark Issue Type: Improvement Reporter: Luciano Resende Priority: Minor Update pull request template to have a link to the current jira issue to facilitate navigation between the two. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14589) Enhance DB2 JDBC Dialect docker tests
Luciano Resende created SPARK-14589: --- Summary: Enhance DB2 JDBC Dialect docker tests Key: SPARK-14589 URL: https://issues.apache.org/jira/browse/SPARK-14589 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Luciano Resende -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14504) Enable Oracle docker integration tests
Luciano Resende created SPARK-14504: --- Summary: Enable Oracle docker integration tests Key: SPARK-14504 URL: https://issues.apache.org/jira/browse/SPARK-14504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Luciano Resende Priority: Minor Enable Oracle docker integration tests -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14442) Error: Cannot load main class from JAR file:/usr/iop/current/spark-client/lib/datanucleus-core-3.2.10.jar
[ https://issues.apache.org/jira/browse/SPARK-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende closed SPARK-14442. --- Resolution: Not A Problem Chatted with [~sthota] and the issue seemed to be some spaces between the multiple included jars which was causing Spark to look for the main class in a different file. Removing the spaces seems to resolve the problem. > Error: Cannot load main class from JAR > file:/usr/iop/current/spark-client/lib/datanucleus-core-3.2.10.jar > - > > Key: SPARK-14442 > URL: https://issues.apache.org/jira/browse/SPARK-14442 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.4.1, 1.5.1 >Reporter: Sudhakar Thota >Priority: Minor > Attachments: simple-project_2.10-1.0.jar > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-13703) Remove obsolete scala-2.10 source files
[ https://issues.apache.org/jira/browse/SPARK-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende closed SPARK-13703. --- Resolution: Not A Problem > Remove obsolete scala-2.10 source files > --- > > Key: SPARK-13703 > URL: https://issues.apache.org/jira/browse/SPARK-13703 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Luciano Resende >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13703) Remove obsolete scala-2.10 source files
[ https://issues.apache.org/jira/browse/SPARK-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende reopened SPARK-13703: - Now that it seems we are dropping Scala 2.10 for Spark 2.0, we can cleanup some Scala 2.10 specific files. > Remove obsolete scala-2.10 source files > --- > > Key: SPARK-13703 > URL: https://issues.apache.org/jira/browse/SPARK-13703 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Luciano Resende >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188447#comment-15188447 ] Luciano Resende commented on SPARK-12555: - This issue is still reproducible in Spark 1.6.x but seems resolved in 2.x. I have added a test case in trunk (PR #11623) to avoid future regression, but please let us know if there is a need to backport fixes. > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > {code} > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-12555: Labels: big-endian (was: ) > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > Labels: big-endian > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12319) ExchangeCoordinatorSuite fails on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-12319: Labels: big-endian (was: ) > ExchangeCoordinatorSuite fails on big-endian platforms > -- > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > Labels: big-endian > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13736) Big-Endian plataform issues
[ https://issues.apache.org/jira/browse/SPARK-13736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185173#comment-15185173 ] Luciano Resende commented on SPARK-13736: - We want to start testing proactively on big-endian and group these platform specific issues in a central place for easy of use. As for other big-endian issues, if you are aware of others, please add them here, I was not aware and didn't find others on my quick search. > Big-Endian plataform issues > --- > > Key: SPARK-13736 > URL: https://issues.apache.org/jira/browse/SPARK-13736 > Project: Spark > Issue Type: Epic > Components: SQL >Affects Versions: 1.6.0 >Reporter: Luciano Resende >Priority: Critical > > We are starting to see few issues when building/testing on Big-Endian > platform. This serves as an umbrella jira to group all platform specific > issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12555) Datasets: data is corrupted when input data is reordered
[ https://issues.apache.org/jira/browse/SPARK-12555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-12555: Issue Type: Sub-task (was: Bug) Parent: SPARK-13736 > Datasets: data is corrupted when input data is reordered > > > Key: SPARK-12555 > URL: https://issues.apache.org/jira/browse/SPARK-12555 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: ALL platforms on 1.6 >Reporter: Tim Preece > > Testcase > --- > {code} > import org.apache.spark.sql.expressions.Aggregator > import org.apache.spark.{SparkConf, SparkContext} > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.Dataset > case class people(age: Int, name: String) > object nameAgg extends Aggregator[people, String, String] { > def zero: String = "" > def reduce(b: String, a: people): String = a.name + b > def merge(b1: String, b2: String): String = b1 + b2 > def finish(r: String): String = r > } > object DataSetAgg { > def main(args: Array[String]) { > val conf = new SparkConf().setAppName("DataSetAgg") > val spark = new SparkContext(conf) > val sqlContext = new SQLContext(spark) > import sqlContext.implicits._ > val peopleds: Dataset[people] = sqlContext.sql("SELECT 'Tim Preece' AS > name, 1279869254 AS age").as[people] > peopleds.groupBy(_.age).agg(nameAgg.toColumn).show() > } > } > {code} > Result ( on a Little Endian Platform ) > > {noformat} > +--+--+ > |_1|_2| > +--+--+ > |1279869254|FAILTi| > +--+--+ > {noformat} > Explanation > --- > Internally the String variable in the unsafe row is not updated after an > unsafe row join operation. > The displayed string is corrupted and shows part of the integer ( interpreted > as a string ) along with "Ti" > The column names also look different on a Little Endian platform. > Result ( on a Big Endian Platform ) > {noformat} > +--+--+ > | value|nameAgg$(name,age)| > +--+--+ > |1279869254|LIAFTi| > +--+--+ > {noformat} > The following Unit test also fails ( but only explicitly on a Big Endian > platorm ) > org.apache.spark.sql.DatasetAggregatorSuite > - typed aggregation: class input with reordering *** FAILED *** > Results do not match for query: > == Parsed Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Analyzed Logical Plan == > value: string, ClassInputAgg$(b,a): int > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Optimized Logical Plan == > Aggregate [value#748], > [value#748,(ClassInputAgg$(b#650,a#651),mode=Complete,isDistinct=false) AS > ClassInputAgg$(b,a)#762] > +- AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] > +- Project [one AS b#650,1 AS a#651] > +- OneRowRelation$ > > == Physical Plan == > TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Final,isDistinct=false)], > output=[value#748,ClassInputAgg$(b,a)#762]) > +- TungstenExchange hashpartitioning(value#748,5), None > +- TungstenAggregate(key=[value#748], > functions=[(ClassInputAgg$(b#650,a#651),mode=Partial,isDistinct=false)], > output=[value#748,value#758]) > +- !AppendColumns , class[a[0]: int, b[0]: string], > class[value[0]: string], [value#748] >+- Project [one AS b#650,1 AS a#651] > +- Scan OneRowRelation[] > == Results == > !== Correct Answer - 1 == == Spark Answer - 1 == > ![one,1][one,9] (QueryTest.scala:127) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12319) ExchangeCoordinatorSuite fails on big-endian platforms
[ https://issues.apache.org/jira/browse/SPARK-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-12319: Issue Type: Sub-task (was: Bug) Parent: SPARK-13736 > ExchangeCoordinatorSuite fails on big-endian platforms > -- > > Key: SPARK-12319 > URL: https://issues.apache.org/jira/browse/SPARK-12319 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 > Environment: Problems apparent on BE, LE could be impacted too >Reporter: Adam Roberts >Priority: Critical > > JIRA to cover endian specific problems - since testing 1.6 I've noticed > problems with DataFrames on BE platforms, e.g. > https://issues.apache.org/jira/browse/SPARK-9858 > [~joshrosen] [~yhuai] > Current progress: using com.google.common.io.LittleEndianDataInputStream and > com.google.common.io.LittleEndianDataOutputStream within UnsafeRowSerializer > fixes three test failures in ExchangeCoordinatorSuite but I'm concerned > around performance/wider functional implications > "org.apache.spark.sql.DatasetAggregatorSuite.typed aggregation: class input > with reordering" fails as we expect "one, 1" but instead get "one, 9" - we > believe the issue lies within BitSetMethods.java, specifically around: return > (wi << 6) + subIndex + java.lang.Long.numberOfTrailingZeros(word); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13736) Big-Endian plataform issues
Luciano Resende created SPARK-13736: --- Summary: Big-Endian plataform issues Key: SPARK-13736 URL: https://issues.apache.org/jira/browse/SPARK-13736 Project: Spark Issue Type: Epic Components: SQL Affects Versions: 1.6.0 Reporter: Luciano Resende Priority: Critical We are starting to see few issues when building/testing on Big-Endian platform. This serves as an umbrella jira to group all platform specific issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende closed SPARK-. -- Resolution: Cannot Reproduce I have tried the scenarios above in Spark trunk using both Postgres and DB2, see: https://github.com/lresende/spark-sandbox/blob/master/src/main/scala/com/luck/sql/JDBCApplication.scala And the described issues seems not reproducible anymore, see all results below root |-- Symbol: string (nullable = true) |-- Name: string (nullable = true) |-- Sector: string (nullable = true) |-- Price: double (nullable = true) |-- Dividend Yield: double (nullable = true) |-- Price/Earnings: double (nullable = true) |-- Earnings/Share: double (nullable = true) |-- Book Value: double (nullable = true) |-- 52 week low: double (nullable = true) |-- 52 week high: double (nullable = true) |-- Market Cap: double (nullable = true) |-- EBITDA: double (nullable = true) |-- Price/Sales: double (nullable = true) |-- Price/Book: double (nullable = true) |-- SEC Filings: string (nullable = true) +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ |Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC Filings| +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ |S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0|10.0| 10.0| 10.0| 10.0| 10.0| 100| |s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0|20.0| 20.0| 20.0| 20.0| 20.0| 200| +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ +--+ |AvgCPI| +--+ | 15.0| +--+ +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ |Symbol| Name|Sector|Price|Dividend Yield|Price/Earnings|Earnings/Share|Book Value|52 week low|52 week high|Market Cap|EBITDA|Price/Sales|Price/Book|SEC Filings| +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ |S1|Name 1| Sec 1| 10.0| 10.0| 10.0| 10.0| 10.0| 10.0|10.0| 10.0| 10.0| 10.0| 10.0| 100| |s2|Name 2| Sec 2| 20.0| 20.0| 20.0| 20.0| 20.0| 20.0|20.0| 20.0| 20.0| 20.0| 20.0| 200| +--+--+--+-+--+--+--+--+---++--+--+---+--+---+ > org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names > - > > Key: SPARK- > URL: https://issues.apache.org/jira/browse/SPARK- > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 > Environment: >Reporter: John Ferguson >Priority: Critical > > Is there a way to have JDBC DataFrames use quoted/escaped column names? > Right now, it looks like it "sees" the names correctly in the schema created > but does not escape them in the SQL it creates when they are not compliant: > org.apache.spark.sql.jdbc.JDBCRDD > > private val columnList: String = { > val sb = new StringBuilder() > columns.foreach(x => sb.append(",").append(x)) > if (sb.length == 0) "1" else sb.substring(1) > } > If you see value in this, I would take a shot at adding the quoting > (escaping) of column names here. If you don't do it, some drivers... like > postgresql's will simply drop case all names when parsing the query. As you > can see in the TL;DR below that means they won't match the schema I am given. > TL;DR: > > I am able to connect to a Postgres database in the shell (with driver > referenced): >val jdbcDf = > sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500") > In fact when I run: >jdbcDf.registerTempTable("sp500") >val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI > FROM sp500") > and >val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share"))) > The values come back as expected. However, if I try: >jdbcDf.show > Or if I try > >val all = sqlContext.sql("SELECT * FROM sp500") >all.show > I get errors about column names not
[jira] [Created] (SPARK-13703) Remove obsolete scala-2.10 source files
Luciano Resende created SPARK-13703: --- Summary: Remove obsolete scala-2.10 source files Key: SPARK-13703 URL: https://issues.apache.org/jira/browse/SPARK-13703 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13248) Remove depecrated Streaming APIs
Luciano Resende created SPARK-13248: --- Summary: Remove depecrated Streaming APIs Key: SPARK-13248 URL: https://issues.apache.org/jira/browse/SPARK-13248 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Luciano Resende Following the direction of other modules for the 2.0 release, removing deprecated streaming APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12357) Implement unhandledFilter interface for JDBC
[ https://issues.apache.org/jira/browse/SPARK-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende closed SPARK-12357. --- Resolution: Duplicate > Implement unhandledFilter interface for JDBC > > > Key: SPARK-12357 > URL: https://issues.apache.org/jira/browse/SPARK-12357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.6.0 >Reporter: Hyukjin Kwon >Priority: Critical > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13190) Update pom.xml to reference Scala 2.11
Luciano Resende created SPARK-13190: --- Summary: Update pom.xml to reference Scala 2.11 Key: SPARK-13190 URL: https://issues.apache.org/jira/browse/SPARK-13190 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13191) Update LICENSE with Scala 2.11 dependencies
Luciano Resende created SPARK-13191: --- Summary: Update LICENSE with Scala 2.11 dependencies Key: SPARK-13191 URL: https://issues.apache.org/jira/browse/SPARK-13191 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13194) Update release audit tools to use Scala 2.11
Luciano Resende created SPARK-13194: --- Summary: Update release audit tools to use Scala 2.11 Key: SPARK-13194 URL: https://issues.apache.org/jira/browse/SPARK-13194 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13193) Update Docker tests to use Scala 2.11
Luciano Resende created SPARK-13193: --- Summary: Update Docker tests to use Scala 2.11 Key: SPARK-13193 URL: https://issues.apache.org/jira/browse/SPARK-13193 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13189) Cleanup build references to Scala 2.10
Luciano Resende created SPARK-13189: --- Summary: Cleanup build references to Scala 2.10 Key: SPARK-13189 URL: https://issues.apache.org/jira/browse/SPARK-13189 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.0.0 Reporter: Luciano Resende There are still few places referencing scala 2.10/2.10.5 while it should be 2.11/2.11.7 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende resolved SPARK-5159. Resolution: Fixed Fix Version/s: 1.5.2 After some of the proposed fixes went in with SPARK-6910, I am not able to reproduce this issue anymore with Spark 1.5.2 as described on the comments above. If others still see this issue, please reopen this jira and describe specific steps where this is still valid. > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > Fix For: 1.5.2 > > Attachments: spark_thrift_server_log.txt > > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113764#comment-15113764 ] Luciano Resende commented on SPARK-5159: [~zhanzhang] Yes, the user that has proper access does get access to the db artifacts properly in my test environment. > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > Attachments: spark_thrift_server_log.txt > > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12899) Spark Data Security
Luciano Resende created SPARK-12899: --- Summary: Spark Data Security Key: SPARK-12899 URL: https://issues.apache.org/jira/browse/SPARK-12899 Project: Spark Issue Type: Epic Components: Spark Core Reporter: Luciano Resende While discussing SPARK-5159 we identified that even with impersonation in the context of Hive, we would still have data being shared between users without respecting any security boundaries. In this epic, we want to identify a design to enable better handling of these boundaries on the scope of RDDs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102702#comment-15102702 ] Luciano Resende commented on SPARK-5159: [~ilovesoup] As I mentioned before, most if not all your changes have been applied via SPARK-6910 @All, I understand there is a bigger issue here, regarding data that is stored out of hive, but I would treat that as a different epic for Spark Data Security, while for this current issue, I would like us to concentrate on the remaining issue related to doAs when Kerberos is enabled. > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > Attachments: spark_thrift_server_log.txt > > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096893#comment-15096893 ] Luciano Resende commented on SPARK-5159: [~saurfang] Did you find out more about the item you reported ? Is this the same issue, or a different issue and we should track as a different jira ? It seems that this is working for me on 1.5.2, and I am about to verify on 1.6 > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true
[ https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074085#comment-15074085 ] Luciano Resende commented on SPARK-5159: Is this still an issue ? Most of the code on the initial PR seems to be merged via SPARK-6910 and when i try to run the Spark Hive sample in yarn mode (Spark 1.5.1) it seems to me that my user is getting impersonated and I get the proper exception saying my user does not have permission. Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=lresende, access=WRITE, inode="/user/lresende/.sparkStaging/application_1450998431030_0001":hdfs:hdfs:drwxr-xr-x Is there a specific scenario that this is still reproducible ? > Thrift server does not respect hive.server2.enable.doAs=true > > > Key: SPARK-5159 > URL: https://issues.apache.org/jira/browse/SPARK-5159 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: Andrew Ray > > I'm currently testing the spark sql thrift server on a kerberos secured > cluster in YARN mode. Currently any user can access any table regardless of > HDFS permissions as all data is read as the hive user. In HiveServer2 the > property hive.server2.enable.doAs=true causes all access to be done as the > submitting user. We should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11910) Streaming programming guide references wrong dependency version
Luciano Resende created SPARK-11910: --- Summary: Streaming programming guide references wrong dependency version Key: SPARK-11910 URL: https://issues.apache.org/jira/browse/SPARK-11910 Project: Spark Issue Type: Bug Components: Documentation, Streaming Affects Versions: 1.6.0 Reporter: Luciano Resende Priority: Minor SPARK-11245 have upgraded twitter dependency to 4.0.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11245) Upgrade twitter4j to version 4.x
Luciano Resende created SPARK-11245: --- Summary: Upgrade twitter4j to version 4.x Key: SPARK-11245 URL: https://issues.apache.org/jira/browse/SPARK-11245 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.5.1 Reporter: Luciano Resende Fix For: 1.6.0 Twitter4J is already on 4.x release https://github.com/yusuke/twitter4j/releases -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11247) Back to master always try to use fqdn node name
Luciano Resende created SPARK-11247: --- Summary: Back to master always try to use fqdn node name Key: SPARK-11247 URL: https://issues.apache.org/jira/browse/SPARK-11247 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.5.1 Reporter: Luciano Resende Fix For: 1.6.0 In a standalone spark deployment Access spark master using ip address of the machine (e.g. DNS is not setup) Click on a given worker Click back to master Result 404 The UI is always trying to resolve the FQDN of the node, instead of using the provided IP which is giving 404 because there is no DNS enabled for the node. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
Luciano Resende created SPARK-10521: --- Summary: Utilize Docker to test DB2 JDBC Dialect support Key: SPARK-10521 URL: https://issues.apache.org/jira/browse/SPARK-10521 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.4.1, 1.5.0 Reporter: Luciano Resende There was a discussion in SPARK-10170 around using a docker image to execute the DB2 JDBC dialect tests. I will use this jira to work on providing the basic image together with the test integration. We can then extend the testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10521) Utilize Docker to test DB2 JDBC Dialect support
[ https://issues.apache.org/jira/browse/SPARK-10521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737635#comment-14737635 ] Luciano Resende commented on SPARK-10521: - I'll be submitting a PR for this shortly. > Utilize Docker to test DB2 JDBC Dialect support > --- > > Key: SPARK-10521 > URL: https://issues.apache.org/jira/browse/SPARK-10521 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.4.1, 1.5.0 >Reporter: Luciano Resende > > There was a discussion in SPARK-10170 around using a docker image to execute > the DB2 JDBC dialect tests. I will use this jira to work on providing the > basic image together with the test integration. We can then extend the > testing coverage as needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-10398) Migrate Spark download page to use new lua mirroring scripts
[ https://issues.apache.org/jira/browse/SPARK-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende reopened SPARK-10398: - There are few other places where the closer.cgi is referenced. > Migrate Spark download page to use new lua mirroring scripts > > > Key: SPARK-10398 > URL: https://issues.apache.org/jira/browse/SPARK-10398 > Project: Spark > Issue Type: Task > Components: Project Infra >Reporter: Luciano Resende >Assignee: Sean Owen >Priority: Minor > Fix For: 1.5.0 > > > From infra team : > If you refer to www.apache.org/dyn/closer.cgi, please refer to > www.apache.org/dyn/closer.lua instead from now on. > Any non-conforming CGI scripts are no longer enabled, and are all > rewritten to go to our new mirror system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10398) Migrate Spark download page to use new lua mirroring scripts
[ https://issues.apache.org/jira/browse/SPARK-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-10398: Attachment: SPARK-10398 This patch handles other download links referenced on the Spark docs as well. > Migrate Spark download page to use new lua mirroring scripts > > > Key: SPARK-10398 > URL: https://issues.apache.org/jira/browse/SPARK-10398 > Project: Spark > Issue Type: Task > Components: Project Infra >Reporter: Luciano Resende >Assignee: Sean Owen >Priority: Minor > Fix For: 1.5.0 > > Attachments: SPARK-10398 > > > From infra team : > If you refer to www.apache.org/dyn/closer.cgi, please refer to > www.apache.org/dyn/closer.lua instead from now on. > Any non-conforming CGI scripts are no longer enabled, and are all > rewritten to go to our new mirror system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10398) Migrate Spark download page to use new lua mirroring scripts
[ https://issues.apache.org/jira/browse/SPARK-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725619#comment-14725619 ] Luciano Resende commented on SPARK-10398: - I can submit a PR for the docs as well, let me look into those. > Migrate Spark download page to use new lua mirroring scripts > > > Key: SPARK-10398 > URL: https://issues.apache.org/jira/browse/SPARK-10398 > Project: Spark > Issue Type: Task > Components: Project Infra >Reporter: Luciano Resende >Assignee: Sean Owen >Priority: Minor > Fix For: 1.5.0 > > Attachments: SPARK-10398 > > > From infra team : > If you refer to www.apache.org/dyn/closer.cgi, please refer to > www.apache.org/dyn/closer.lua instead from now on. > Any non-conforming CGI scripts are no longer enabled, and are all > rewritten to go to our new mirror system. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10175) Enhance spark doap file
[ https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720638#comment-14720638 ] Luciano Resende commented on SPARK-10175: - ping Enhance spark doap file --- Key: SPARK-10175 URL: https://issues.apache.org/jira/browse/SPARK-10175 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Luciano Resende Attachments: SPARK-10175 The Spark doap has broken links and is also missing entries related to issue tracker and mailing lists. This affects the list in projects.apache.org and also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10175) Enhance spark doap file
Luciano Resende created SPARK-10175: --- Summary: Enhance spark doap file Key: SPARK-10175 URL: https://issues.apache.org/jira/browse/SPARK-10175 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Luciano Resende The Spark doap has broken links and is also missing entries related to issue tracker and mailing lists. This affects the list in projects.apache.org and also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10175) Enhance spark doap file
[ https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-10175: Attachment: SPARK-10175 Updates to the doap located on the website svn repository. Enhance spark doap file --- Key: SPARK-10175 URL: https://issues.apache.org/jira/browse/SPARK-10175 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Luciano Resende Attachments: SPARK-10175 The Spark doap has broken links and is also missing entries related to issue tracker and mailing lists. This affects the list in projects.apache.org and also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org