[ https://issues.apache.org/jira/browse/CARBONDATA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-3255. ---------------------------------- Fix Version/s: 2.0.0 Resolution: Fixed > CarbonData provides python interface to support to write and read structured > and unstructured data in CarbonData > ---------------------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-3255 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3255 > Project: CarbonData > Issue Type: Sub-task > Reporter: Bo Xu > Assignee: Bo Xu > Priority: Major > Fix For: 2.0.0 > > Time Spent: 11h 10m > Remaining Estimate: 0h > > Apache CarbonData already provide Java/ Scala/C++ interface for users, and > more and more people use python to manage and analysis big data, so it's > better to provide python interface to support to write and read structured > and unstructured data in CarbonData, like String, int and binary data: > image/voice/video. It should not dependency Apache Spark. We called it is > PYSDK. > PYSDK based on CarbonData Java SDK, use pyjnius to call java code in python > code. Even though Apache Spark use py4j in PySpark to call java code in > python, but it's low performance when use py4j to read bigdata with > CarbonData format in python code, py4j also show low performance when read > big data in their report: > https://www.py4j.org/advanced_topics.html#performance. JPype is also a > popular tool to call java code in python, but it already stoped update > several years ago, so we can not use it. In our test, pyjnius has high > performance to read big data by call java code in python, so it's good choice > for us. > We already work for these feature several months in > https://github.com/xubo245/pycarbon > Goals: > 1. PYSDK should provide interface to support read data > 2. PYSDK should provide interface to support write data > 3. PYSDK should support basic data types > 4. PYSDK should support projection > 5. PYSDK should support filter -- This message was sent by Atlassian Jira (v8.3.4#803005)