[ 
https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3255.
----------------------------------
    Fix Version/s: 2.0.0
       Resolution: Fixed

> CarbonData provides python interface to support to write and read structured 
> and unstructured data in CarbonData
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-3255
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3255
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Bo Xu
>            Assignee: Bo Xu
>            Priority: Major
>             Fix For: 2.0.0
>
>          Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Apache CarbonData already provide Java/ Scala/C++ interface for users, and 
> more and more people use python to manage and analysis big data, so it's 
> better to provide python interface to support to write and read structured 
> and unstructured data in CarbonData, like String, int and binary data: 
> image/voice/video. It should not dependency Apache Spark. We called it is 
> PYSDK.
> PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python 
> code.  Even though Apache Spark use py4j in PySpark to call java code in 
> python, but it's low performance when use py4j to read bigdata with 
> CarbonData format in python code, py4j also show low performance when read 
> big data in their report: 
> https://www.py4j.org/advanced_topics.html#performance.  JPype is also a 
> popular tool to call java code in python, but it already stoped update 
> several years ago, so we can not use it.  In our test, pyjnius has high 
> performance to read big data by call java code in python, so it's good choice 
> for us.
> We already work for these feature several months in  
> https://github.com/xubo245/pycarbon
> Goals:
> 1. PYSDK should provide interface to support read data
> 2. PYSDK should provide interface to support write data
> 3. PYSDK should support basic data types
> 4. PYSDK should support projection
> 5. PYSDK should support filter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to