Re: First Sedona release

2020-12-09 Thread Jia Yu
Hi Felix, Jim and Netanel and other Sedona committers,

As you know, my JTS PR has been accepted to JTS 1.18-SNAPSHOT and we are
waiting for the official release of JTS 1.18 on Maven. However, I didn't
see a clear date when JTS 1.18 will be published. I guess this will take
one or two months to happen.

Currently, Sedona 1.0.0 release is blocked by this issue (Maven Central
does not allow SNAPSHOTS to be dependencies). Since we are so desperate to
publish Sedona 1.0.0 as soon as possible, I proposed to copy the latest JTS
source code into Sedona-core in our 1.0.0 release. In the future release
(say Sedona 1.0.1), we can drop JTS source code and use their Maven
release. JTS source code is dual-licensed under Eclipse Public License 2.0
and Eclipse Distribution License 1.0 (a BSD Style License). So it is safe
to keep it in Sedona.

What do you think? @Jim Hughes   Is this a good idea?

Thanks,
Jia

On Fri, Dec 4, 2020 at 10:43 PM Jia Yu  wrote:

> Hi Netanel,
>
> So for Sedona SQL 1.0.0 on Spark 2.4, we can do
> "sedona-sql_2.11-2.4-1.0.0-incubator" , right?
>
> Sedona 1.0 on Spark 2.4 and 3.0 will be compiled against Scala 2.11 and
> 2.12. I believe this can be done via different compilation target in Maven.
>
> I am currently looking at whether I can do conditional compilation using
> Maven (similar to C++ #ifdef) because there is a change in Aggregator in
> Spark 3.0. Otherwise I always need to maintain a separate branch for Sedona
> on Spark 2.4
>
> It looks OK to me.
>
> On Fri, Dec 4, 2020 at 1:12 AM Netanel Malka  wrote:
>
>> Hi,
>> I think that we can follow the Apache Spark convention as you can see
>> here
>> .
>> For example:
>> sedona-sql_2.11-2.4, where 2.11 -> scala version and 2.4 -> spark version
>>
>>  What do you think?
>>
>>
>> On Fri, 4 Dec 2020 at 10:34, Jia Yu  wrote:
>>
>>> Dear all,
>>>
>>> The current status:
>>> 1. Move to JTS PR has been merged to the master branch. If JTS 1.18 gets
>>> published in a few weeks, we will use the latest JTS. Otherwise, we still
>>> need to use my fork for this release. But Sedona API is now finalized. From
>>> the user perspective, use my fork or JTS official release should not make
>>> any difference.
>>> 2. Sedona doc update is in progress. I am half way there. You can track
>>> the progress here: https://github.com/apache/incubator-sedona/pull/493
>>> 3. I will create a separate branch to test Spark 2.4 over this weekend.
>>> 4. Pawel is working on his improvement on RDD-SQL Python adapter.
>>>
>>> Question:
>>>
>>> What is the most appropriate maven artifact name for Sedona on Spark
>>> 2.4? I used to put "sedona-sql_2.4". But it looks like "_2.4" is usually
>>> reserved for specifying the Scala version. How about "sedona-sql-spark2"?
>>> Should we also use "sedona-sql-spark3" for Spark 3.0?
>>>
>>> Thanks,
>>> Jia
>>>
>>> On Tue, Nov 24, 2020 at 8:16 AM Jim Hughes  wrote:
>>>
 Hi all,

 Felix, good to know that a WIP disclaimer is standard practice and will
 let things move forward!

 Jia, I believe that page is explaining that a portion of the code in
 various GeoTools modules has other licenses on it.  As such, gt-main is
 mostly LGPL with some BSD code as well.

 Cheers,

 Jim

 On 11/23/2020 9:50 PM, Jia Yu wrote:
 > Thank you, Felix. I will use the WIP disclaimer.
 >
 > To answer Jim's question, GeoTools components use different licenses:
 > https://docs.geotools.org/latest/userguide/welcome/license.html
 >
 > GT-main uses BSD, so its binary can be included in Sedona's release.
 > Other components in GeoTools use LGPL, but Sedona only uses them for
 CRS
 > transformation. I already set the dependency scope to "provided" in
 > Sedona's POM.xml. If a user wants to use CRS transformation in
 Sedona, they
 > will have to add some GeoTools library by themselves.
 >
 >
 > On Mon, Nov 23, 2020 at 6:24 PM Felix Cheung 
 wrote:
 >
 >> On Mon, Nov 23, 2020 at 6:03 PM Felix Cheung >>> >
 >> wrote:
 >>
 >>> I’d strongly recommend the community to move towards the first
 release
 >>> with the WIP disclaimer
 >>>
 >>>
 >>
 https://incubator.apache.org/policy/incubation.html#work_in_progress_disclaimer
 >>> https://incubator.apache.org/policy/incubation.html#releases
 >>>
 >>>
 >>> As for the LGPL dependency specifically, a replacement will be
 needed?
 >>>
 >>
 >> To clarify, ok to note in the WIP disclaimer- so it can be released
 with
 >> this.
 >>
 >>
 >>
 >>> On Mon, Nov 23, 2020 at 11:15 AM Jim Hughes 
 wrote:
 >>>
  Hi all,
 
  Has the fact that one of the dependencies is LGPL (GeoTools) been
  discussed / addressed?  (See
  https://www.apache.org/legal/resolved.html#category-x)
 
  I'm asking since 

[jira] [Closed] (SEDONA-7) Build the source code towards Spark 2.4, 3.0 and Scala 2.11 and 2.12

2020-12-09 Thread Jia Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SEDONA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Yu closed SEDONA-7.
---
Resolution: Fixed

> Build the source code towards Spark 2.4, 3.0 and Scala 2.11 and 2.12
> 
>
> Key: SEDONA-7
> URL: https://issues.apache.org/jira/browse/SEDONA-7
> Project: Apache Sedona
>  Issue Type: Improvement
>Reporter: Jia Yu
>Priority: Major
>  Labels: pull-request-available
>
> As stated in the title



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sedona] jiayuasu merged pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

2020-12-09 Thread GitBox


jiayuasu merged pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

2020-12-09 Thread GitBox


jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-742123181


   @Imbruced I have fixed the Python 3.9 issue. It turns out that we only need 
to do `sudo apt-get install libgeos-dev`. Now I will merge the PR. You can go 
ahead and open a PR for your faster Adapter.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (SEDONA-9) Test Issue notification to dev mailing list

2020-12-09 Thread Jia Yu (Jira)


 [ 
https://issues.apache.org/jira/browse/SEDONA-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Yu closed SEDONA-9.
---
Resolution: Fixed

This is a test. The dev mailing list should receive the close ticket 
notification

> Test Issue notification to dev mailing list
> ---
>
> Key: SEDONA-9
> URL: https://issues.apache.org/jira/browse/SEDONA-9
> Project: Apache Sedona
>  Issue Type: Test
>Reporter: Jia Yu
>Priority: Major
>
> This is a test. The dev mailing list should receive notification



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SEDONA-9) Test Issue notification to dev mailing list

2020-12-09 Thread Jia Yu (Jira)
Jia Yu created SEDONA-9:
---

 Summary: Test Issue notification to dev mailing list
 Key: SEDONA-9
 URL: https://issues.apache.org/jira/browse/SEDONA-9
 Project: Apache Sedona
  Issue Type: Test
Reporter: Jia Yu


This is a test. The dev mailing list should receive notification



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [incubator-sedona] Imbruced commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

2020-12-09 Thread GitBox


Imbruced commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-742069776


   @jiayuasu Thats maybe bcs of shapely or geopandas. I will take a look if it 
can be fixed in short amount of time. Also python 3.9 is fresh release, please 
look at pyspark download statistics
   
![image](https://user-images.githubusercontent.com/22958216/101688858-ec48b900-3a6c-11eb-86b0-511566f074a4.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

2020-12-09 Thread GitBox


jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741675706


   @Imbruced I have successfully made Sedona run on Spark 2.4.7 + Python 3.7. 
In fact, I am glad that the test failed before.
   
   There was a bug in the root pom.xml (sedona-parent). It packaged a wrong 
jackson into the compiled Sedona jar. It was introduced by PR 
https://github.com/apache/incubator-sedona/pull/471
   
   This bug will sometimes cause the Scala / Java / Python Sedona fail in the 
Spark cluster mode. Once I removed this dependency, all test passed. Now as you 
can see in the GitHub CI test result, 6 checks have passed.
   
   The only thing left is the test on Spark 3.0.1 + Python 3.9. Based on my 
initial test https://github.com/apache/incubator-sedona/runs/1521112458  , the 
error is `OSError: Could not find library geos_c or load any of its variants 
['libgeos_c.so.1', 'libgeos_c.so']`  It looks like some of the Sedona Python 
packages need to be updated.
   
   If you think Spark 3.0.1 + Python 3.9 is something easy to fix, please let 
me know the solution. If you think this will take some time, I will directly 
merge this PR and leave Python 3.9 support for the future work.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-sedona] jiayuasu commented on pull request #494: [SEDONA-7] Build Sedona for Spark 2.4, 3.0 and Scala 2.11, 2.12

2020-12-09 Thread GitBox


jiayuasu commented on pull request #494:
URL: https://github.com/apache/incubator-sedona/pull/494#issuecomment-741636788


   @Imbruced 
   
   Test on PySpark 2.4.7 + Python 3.7 still failed. Please see 
https://github.com/apache/incubator-sedona/runs/1522711633?check_suite_focus=true
   
   It uses the correct PySpark version 2.4.7, and Spark binary version 2.4.7. I 
use `pipenv graph` to print out all installed packages. PySpark in Pipfile is 
also set to `>=2.4.0`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org