[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

GitBox Fri, 02 Sep 2022 10:22:52 -0700


rdblue commented on code in PR #5672:
URL: https://github.com/apache/iceberg/pull/5672#discussion_r961879301



##########
docs/python-api-intro.md:
##########
@@ -27,158 +27,152 @@ menu:
 
 # Iceberg Python API
 
-Much of the python api conforms to the java api. You can get more info about 
the java api [here](../api).
+Much of the python api conforms to the Java API. You can get more info about 
the java api [here](../api).
 
-## Catalog
-
-The Catalog interface, like java provides search and management operations for 
tables.
-
-To create a catalog:
+## Instal
 
-``` python
-from iceberg.hive import HiveTables
+You can install the latest release version from pypi:
 
-# instantiate Hive Tables
-conf = {"hive.metastore.uris": 'thrift://{hms_host}:{hms_port}',
-        "hive.metastore.warehouse.dir": {tmpdir} }
-tables = HiveTables(conf)
+```sh
+pip3 install "pyiceberg[s3fs,hive]"
 ```
 
-and to create a table from a catalog:
-
-``` python
-from iceberg.api.schema import Schema\
-from iceberg.api.types import TimestampType, DoubleType, StringType, 
NestedField
-from iceberg.api.partition_spec import PartitionSpecBuilder
-
-schema = Schema(NestedField.optional(1, "DateTime", 
TimestampType.with_timezone()),
-                NestedField.optional(2, "Bid", DoubleType.get()),
-                NestedField.optional(3, "Ask", DoubleType.get()),
-                NestedField.optional(4, "symbol", StringType.get()))
-partition_spec = PartitionSpecBuilder(schema).add(1, 1000, "DateTime_day", 
"day").build()
+Or install the latest development version locally:
 
-tables.create(schema, "test.test_123", partition_spec)
 ```
-
-
-## Tables
-
-The Table interface provides access to table metadata
-
-+ schema returns the current table `Schema`
-+ spec returns the current table `PartitonSpec`
-+ properties returns a map of key-value `TableProperties`
-+ currentSnapshot returns the current table `Snapshot`
-+ snapshots returns all valid snapshots for the table
-+ snapshot(id) returns a specific snapshot by ID
-+ location returns the table’s base location
-
-Tables also provide refresh to update the table to the latest version.
-
-### Scanning
-Iceberg table scans start by creating a `TableScan` object with `newScan`.
-
-``` python
-scan = table.new_scan();
+pip3 install poetry --upgrade
+pip3 install -e ".[s3fs,hive]"
 ```
 
-To configure a scan, call filter and select on the `TableScan` to get a new 
`TableScan` with those changes.
-
-``` python
-filtered_scan = scan.filter(Expressions.equal("id", 5))
-```
+With optional dependencies:
 
-String expressions can also be passed to the filter method.
+| Key       | Description:                                                     
     |
+|-----------|-----------------------------------------------------------------------|
+| hive      | Support for the Hive metastore                                   
     |
+| pyarrow   | PyArrow as a FileIO implementation to interact with the object 
store  |
+| s3fs      | S3FS as a FileIO implementation to interact with the object 
store     |
+| zstandard | Support for zstandard Avro compresssion                          
     |

Review Comment:
   Should we make this install by default? Seems like a good one for a hard 
dependency



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #5672: Python: Update docs and fine-tune the API

Reply via email to