Re: Ask for ARM CI for spark
Hi all, I am glad to tell you there is a new progress of build/test spark on aarch64 server, the tests are running, see the build/test detail log https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/job-output.txt.gz and the aarch64 instance info see https://logs.openlabtesting.org/logs/1/1/419fcb11764048d5a3cda186ea76dd43249e1f97/check/spark-build-arm64/75cc6f5/zuul-info/zuul-info.ubuntu-xenial-arm64.txt In order to enable the test, I made some modification, the major one is to build leveldbjni local package, I forked fusesource/leveldbjni and chirino/leveldb repos, and made some modification to make sure to build the local package, see https://github.com/huangtianhua/leveldbjni/pull/1 and https://github.com/huangtianhua/leveldbjni/pull/2 , then to use it in spark, the detail you can find in https://github.com/theopenlab/spark/pull/1 Now the tests are not all successful, I will try to fix it and any suggestion is welcome, thank you all. On Mon, Jul 1, 2019 at 5:25 PM Tianhua huang wrote: > We are focus on the arm instance of cloud, and now I use the arm instance > of vexxhost cloud to run the build job which mentioned above, the > specification of the arm instance is 8VCPU and 8GB of RAM, > and we can use bigger flavor to create the arm instance to run the job, if > need be. > > On Fri, Jun 28, 2019 at 6:55 PM Steve Loughran > wrote: > >> >> Be interesting to see how well a Pi4 works; with only 4GB of RAM you >> wouldn't compile with it, but you could try installing the spark jar bundle >> and then run against some NFS mounted disks: >> https://www.raspberrypi.org/magpi/raspberry-pi-4-specs-benchmarks/ ; >> unlikely to be fast, but it'd be an efficient kind of slow >> >> On Fri, Jun 28, 2019 at 3:08 AM Rui Chen wrote: >> >>> > I think any AA64 work is going to have to define very clearly what >>> "works" is defined as >>> >>> +1 >>> It's very valuable to build a clear scope of these projects >>> functionality for ARM platform in upstream community, it bring confidence >>> to end user and customers when they plan to deploy these projects on ARM. >>> >>> This is absolute long term work, let's to make it step by step, CI, >>> testing, issue and resolving. >>> >>> Steve Loughran 于2019年6月27日周四 下午9:22写道: >>> level db and native codecs are invariably a problem here, as is anything else doing misaligned IO. Protobuf has also had "issues" in the past see https://issues.apache.org/jira/browse/HADOOP-16100 I think any AA64 work is going to have to define very clearly what "works" is defined as; spark standalone with a specific set of codecs is probably the first thing to aim for -no Snappy or lz4. Anything which goes near: protobuf, checksums, native code, etc is in trouble. Don't try and deploy with HDFS as the cluster FS, would be my recommendation. If you want a cluster use NFS or one of google GCS, Azure WASB for the cluster FS. And before trying either of those cloud store, run the filesystem connector test suites (hadoop-azure; google gcs github) to see that they work. If the foundational FS test suites fail, nothing else will work On Thu, Jun 27, 2019 at 3:09 AM Tianhua huang < huangtianhua...@gmail.com> wrote: > I took the ut tests on my arm instance before and reported an issue in > https://issues.apache.org/jira/browse/SPARK-27721, and seems there > was no leveldbjni native package for aarch64 in leveldbjni-all.jar(or 1.8) > https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8 > , we can find https://github.com/fusesource/leveldbjni/pull/82 this > pr added the aarch64 support and merged on 2 Nov 2017, but the latest > release of the repo is on 17 Oct 2013, unfortunately it didn't > include the aarch64 supporting. > > I will running the test on the job mentioned above, and will try to > fix the issue above, or if anyone have any idea of it, welcome reply me, > thank you. > > > On Wed, Jun 26, 2019 at 8:11 PM Sean Owen wrote: > >> Can you begin by testing yourself? I think the first step is to make >> sure the build and tests work on ARM. If you find problems you can >> isolate them and try to fix them, or at least report them. It's only >> worth getting CI in place when we think builds will work. >> >> On Tue, Jun 25, 2019 at 9:26 PM Tianhua huang < >> huangtianhua...@gmail.com> wrote: >> > >> > Thanks Shane :) >> > >> > This sounds good, and yes I agree that it's best to keep the >> test/build infrastructure in one place. If you can't find the ARM >> resource >> we are willing to support the ARM instance :) Our goal is to make more >> open source software to be more compatible for aarch64 platform, so let's >> to do it. I will be happy if I can give some
Re: Release Apache Spark 2.4.4 before 3.0.0
Thank you for the reply, Sean. Sure. 2.4.x should be a LTS version. The main reason of 2.4.4 release (before 3.0.0) is to have a better basis for comparison to 3.0.0. For example, SPARK-27798 had an old bug, but its correctness issue is only exposed at Spark 2.4.3. It would be great if we can have a better basis. Bests, Dongjoon. On Tue, Jul 9, 2019 at 9:52 AM Sean Owen wrote: > We will certainly want a 2.4.4 release eventually. In fact I'd expect > 2.4.x gets maintained for longer than the usual 18 months, as it's the > last 2.x branch. > It doesn't need to happen before 3.0, but could. Usually maintenance > releases happen 3-4 months apart and the last one was 2 months ago. If > these are significant issues, sure. It'll probably be August before > it's out anyway. > > On Tue, Jul 9, 2019 at 11:15 AM Dongjoon Hyun > wrote: > > > > Hi, All. > > > > Spark 2.4.3 was released two months ago (8th May). > > > > As of today (9th July), there exist 45 fixes in `branch-2.4` including > the following correctness or blocker issues. > > > > - SPARK-26038 Decimal toScalaBigInt/toJavaBigInteger not work for > decimals not fitting in long > > - SPARK-26045 Error in the spark 2.4 release package with the > spark-avro_2.11 dependency > > - SPARK-27798 from_avro can modify variables in other rows in local > mode > > - SPARK-27907 HiveUDAF should return NULL in case of 0 rows > > - SPARK-28157 Make SHS clear KVStore LogInfo for the blacklist > entries > > - SPARK-28308 CalendarInterval sub-second part should be padded > before parsing > > > > It would be great if we can have Spark 2.4.4 before we are going to get > busier for 3.0.0. > > If it's okay, I'd like to volunteer for an 2.4.4 release manager to roll > it next Monday. (15th July). > > How do you think about this? > > > > Bests, > > Dongjoon. >
Re: Release Apache Spark 2.4.4 before 3.0.0
We will certainly want a 2.4.4 release eventually. In fact I'd expect 2.4.x gets maintained for longer than the usual 18 months, as it's the last 2.x branch. It doesn't need to happen before 3.0, but could. Usually maintenance releases happen 3-4 months apart and the last one was 2 months ago. If these are significant issues, sure. It'll probably be August before it's out anyway. On Tue, Jul 9, 2019 at 11:15 AM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released two months ago (8th May). > > As of today (9th July), there exist 45 fixes in `branch-2.4` including the > following correctness or blocker issues. > > - SPARK-26038 Decimal toScalaBigInt/toJavaBigInteger not work for > decimals not fitting in long > - SPARK-26045 Error in the spark 2.4 release package with the > spark-avro_2.11 dependency > - SPARK-27798 from_avro can modify variables in other rows in local mode > - SPARK-27907 HiveUDAF should return NULL in case of 0 rows > - SPARK-28157 Make SHS clear KVStore LogInfo for the blacklist entries > - SPARK-28308 CalendarInterval sub-second part should be padded before > parsing > > It would be great if we can have Spark 2.4.4 before we are going to get > busier for 3.0.0. > If it's okay, I'd like to volunteer for an 2.4.4 release manager to roll it > next Monday. (15th July). > How do you think about this? > > Bests, > Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Release Apache Spark 2.4.4 before 3.0.0
Hi, All. Spark 2.4.3 was released two months ago (8th May). As of today (9th July), there exist 45 fixes in `branch-2.4` including the following correctness or blocker issues. - SPARK-26038 Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long - SPARK-26045 Error in the spark 2.4 release package with the spark-avro_2.11 dependency - SPARK-27798 from_avro can modify variables in other rows in local mode - SPARK-27907 HiveUDAF should return NULL in case of 0 rows - SPARK-28157 Make SHS clear KVStore LogInfo for the blacklist entries - SPARK-28308 CalendarInterval sub-second part should be padded before parsing It would be great if we can have Spark 2.4.4 before we are going to get busier for 3.0.0. If it's okay, I'd like to volunteer for an 2.4.4 release manager to roll it next Monday. (15th July). How do you think about this? Bests, Dongjoon.
Re: Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases
It's alright - thanks for that. Anyone can take a look. This is an open source project :D. 2019년 7월 9일 (화) 오후 8:18, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > I can try one and see how it goes, although not familiar with the area. > > Stavros > > On Tue, Jul 9, 2019 at 6:17 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I am currently targeting to improve Python, Pandas UDFs Scala UDF test >> cases by integrating our existing *.sql files at >> https://issues.apache.org/jira/browse/SPARK-27921 >> >> I would appreciate that anyone who's interested in Spark contribution >> takes some sub-tasks. It's too many for me to do :-). I am doing one by one >> for now. >> >> I wrote some guides about this umbrella JIRA specifically so if you're >> able to follow it very closely one by one, I think the process itself isn't >> that difficult. >> >> The most import guide that should be carefully addressed is: >> > 7. If there are diff, analyze it, file or find the JIRA, skip the tests >> with comments. >> >> Thanks! >> > > >
Re: Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases
I can try one and see how it goes, although not familiar with the area. Stavros On Tue, Jul 9, 2019 at 6:17 AM Hyukjin Kwon wrote: > Hi all, > > I am currently targeting to improve Python, Pandas UDFs Scala UDF test > cases by integrating our existing *.sql files at > https://issues.apache.org/jira/browse/SPARK-27921 > > I would appreciate that anyone who's interested in Spark contribution > takes some sub-tasks. It's too many for me to do :-). I am doing one by one > for now. > > I wrote some guides about this umbrella JIRA specifically so if you're > able to follow it very closely one by one, I think the process itself isn't > that difficult. > > The most import guide that should be carefully addressed is: > > 7. If there are diff, analyze it, file or find the JIRA, skip the tests > with comments. > > Thanks! >
Re: Opinions wanted: how much to match PostgreSQL semantics?
Thank you, Sean and all. One decision was made swiftly today. I believe that we can move forward case-by-case for the others until the feature freeze (3.0 branch cut). Bests, Dongjoon. On Mon, Jul 8, 2019 at 13:03 Marco Gaido wrote: > Hi Sean, > > Thanks for bringing this up. Honestly, my opinion is that Spark should be > fully ANSI SQL compliant. Where ANSI SQL compliance is not an issue, I am > fine following any other DB. IMHO, we won't get anyway 100% compliance with > any DB - postgres in this case (e.g. for decimal operations, we are > following SQLServer, and postgres behaviour would be very hard to meet) - > so I think it is fine that PMC members decide for each feature whether it > is worth to support it or not. > > Thanks, > Marco > > On Mon, 8 Jul 2019, 20:09 Sean Owen, wrote: > >> See the particular issue / question at >> https://github.com/apache/spark/pull/24872#issuecomment-509108532 and >> the larger umbrella at >> https://issues.apache.org/jira/browse/SPARK-27764 -- Dongjoon rightly >> suggests this is a broader question. >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>