[jira] [Created] (SEDONA-455) Add a new data source namely geoparquet.metadata

2023-12-26 Thread Jia Yu (Jira)
Jia Yu created SEDONA-455:
-

 Summary: Add a new data source namely geoparquet.metadata
 Key: SEDONA-455
 URL: https://issues.apache.org/jira/browse/SEDONA-455
 Project: Apache Sedona
  Issue Type: New Feature
Reporter: Jia Yu


Can we add a new data source to only read the file level metadata of a parquet 
file? This is crucial for entry-level users to explore an unknown parquet file 
including geoparquet. In our geoparquet case, this will help user know the 
projjson value since we are not able to properly parse it to a known epsg code.

I understand that a Spark DataFrame only allows the schema to be the metadata, 
which cannot be used to hold such information.

So I suggest that we add a new data source namely {{{}geoparquet.metadata{}}}, 
which loads these metadata using {{{}ParquetFileReader 
(https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java){}}}.
 One good example is from DuckDB: 
[duckdb.org/docs/data/parquet/metadata.html|https://duckdb.org/docs/data/parquet/metadata.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [SEDONA-450] Support Spark 3.5 [sedona]

2023-12-26 Thread via GitHub


jiayuasu merged PR #1161:
URL: https://github.com/apache/sedona/pull/1161


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (SEDONA-430) geoparquet writer should have an option called `writeToCrs`

2023-12-26 Thread Kristin Cowalcijk (Jira)


[ 
https://issues.apache.org/jira/browse/SEDONA-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800687#comment-17800687
 ] 

Kristin Cowalcijk commented on SEDONA-430:
--

Pull request: https://github.com/apache/sedona/pull/1162

> geoparquet writer should have an option called `writeToCrs`
> ---
>
> Key: SEDONA-430
> URL: https://issues.apache.org/jira/browse/SEDONA-430
> Project: Apache Sedona
>  Issue Type: New Feature
>Reporter: Jia Yu
>Assignee: Kristin Cowalcijk
>Priority: Major
>
> This option should take a projjson string of the target CRS. Note that, the 
> writer simply writes it to the metadata. It does not perform the actual check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [SEDONA-429][SEDONA-430] Support specifying GeoParquet spec version number and CRS [sedona]

2023-12-26 Thread via GitHub


Kontinuation opened a new pull request, #1162:
URL: https://github.com/apache/sedona/pull/1162

   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/latest-snapshot/community/rule/) and 
[Contributor Development 
Guide](https://sedona.apache.org/latest-snapshot/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the associated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-429 and 
https://issues.apache.org/jira/browse/SEDONA-430. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   * Bumped the default GeoParquet version number from `1.0.0-beta.1` to `1.0.0`
   * Allow specifying GeoParquet version number using `geoparquet.version` 
option
   * Allow specifying CRS metadata for geometry columns using `geoparquet.crs` 
option
   
   ## How was this patch tested?
   
   Add new tests for GeoParquet metadata.
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation update.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [SEDONA-450] Support Spark 3.5 [sedona]

2023-12-26 Thread via GitHub


Kontinuation opened a new pull request, #1161:
URL: https://github.com/apache/sedona/pull/1161

   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor 
Rules](https://sedona.apache.org/latest-snapshot/community/rule/) and 
[Contributor Development 
Guide](https://sedona.apache.org/latest-snapshot/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the associated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-450. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   Add a submodule `spark-3.5` to support reading/writing GeoParquet files in 
Spark 3.5
   
   ## How was this patch tested?
   
   Passing existing tests.
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation update.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org