This is an automated email from the ASF dual-hosted git repository.

ashvin pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-xtable.git

commit 29918115b08d4878284c6ab6bf1f96646cdbec83
Author: Kyle Weller <[email protected]>
AuthorDate: Thu Mar 7 23:40:32 2024 -0800

    removing unnecessary new lines
---
 README.md | 46 ++++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/README.md b/README.md
index cc81f618..ee57016d 100644
--- a/README.md
+++ b/README.md
@@ -14,18 +14,17 @@ of a few interfaces, which we believe will facilitate the 
expansion of supported
 future.
 
 # Building the project and running tests.
-1. Use Java11 for building the project. If you are using some other java 
version, you can
-   use [jenv](https://github.com/jenv/jenv) to use multiple java versions 
locally.
+1. Use Java11 for building the project. If you are using some other java 
version, you can use [jenv](https://github.com/jenv/jenv) to use multiple java 
versions locally.
 2. Build the project using `mvn clean package`. Use `mvn clean package 
-DskipTests` to skip tests while building.
-3. Use `mvn clean test` or `mvn test` to run all unit tests. If you need to 
run only a specific test you can do this by
-   something like `mvn test -Dtest=TestDeltaSync -pl core`.
+3. Use `mvn clean test` or `mvn test` to run all unit tests. If you need to 
run only a specific test you can do this
+by something like `mvn test -Dtest=TestDeltaSync -pl core`.
 4. Similarly, use `mvn clean verify` or `mvn verify` to run integration tests.
 
 # Style guide
-1. We use [Maven Spotless 
plugin](https://github.com/diffplug/spotless/tree/main/plugin-maven) and
+1. We use [Maven Spotless 
plugin](https://github.com/diffplug/spotless/tree/main/plugin-maven) and 
    [Google java format](https://github.com/google/google-java-format) for code 
style.
-2. Use `mvn spotless:check` to find out code style violations and `mvn 
spotless:apply` to fix them. Code style check is
-   tied to compile phase by default, so code style violations will lead to 
build failures.
+2. Use `mvn spotless:check` to find out code style violations and `mvn 
spotless:apply` to fix them. 
+Code style check is tied to compile phase by default, so code style violations 
will lead to build failures.
 
 # Running the bundled jar
 1. Get a pre-built bundled jar or create the jar with `mvn install -DskipTests`
@@ -36,17 +35,21 @@ targetFormats:
   - DELTA
   - ICEBERG
 datasets:
-  - tableBasePath: s3://tpc-ds-datasets/1GB/hudi/call_center
+  -
+    tableBasePath: s3://tpc-ds-datasets/1GB/hudi/call_center
     tableDataPath: s3://tpc-ds-datasets/1GB/hudi/call_center/data
     tableName: call_center
     namespace: my.db
-  - tableBasePath: s3://tpc-ds-datasets/1GB/hudi/catalog_sales
+  -
+    tableBasePath: s3://tpc-ds-datasets/1GB/hudi/catalog_sales
     tableName: catalog_sales
     partitionSpec: cs_sold_date_sk:VALUE
-  - tableBasePath: s3://hudi/multi-partition-dataset
+  -
+    tableBasePath: s3://hudi/multi-partition-dataset
     tableName: multi_partition_dataset
     partitionSpec: time_millis:DAY:yyyy-MM-dd,type:VALUE
-  - tableBasePath: 
abfs://[email protected]/multi-partition-dataset
+  -
+    tableBasePath: 
abfs://[email protected]/multi-partition-dataset
     tableName: multi_partition_dataset
 ```
 - `sourceFormat`  is the format of the source table that you want to convert
@@ -54,9 +57,7 @@ datasets:
 - `tableBasePath` is the basePath of the table
 - `tableDataPath` is an optional field specifying the path to the data files. 
If not specified, the tableBasePath will be used. For Iceberg source tables, 
you will need to specify the `/data` path.
 - `namespace` is an optional field specifying the namespace of the table and 
will be used when syncing to a catalog.
-- `partitionSpec` is a spec that allows us to infer partition values. This is 
only required for Hudi source tables. If
-  the table is not partitioned, leave it blank. If it is partitioned, you can 
specify a spec with a comma separated list
-  with format `path:type:format`
+- `partitionSpec` is a spec that allows us to infer partition values. This is 
only required for Hudi source tables. If the table is not partitioned, leave it 
blank. If it is partitioned, you can specify a spec with a comma separated list 
with format `path:type:format`
     - `path` is a dot separated path to the partition field
     - `type` describes how the partition value was generated from the column 
value
         - `VALUE`: an identity transform of field value to partition value
@@ -64,10 +65,8 @@ datasets:
         - `MONTH`: same as `YEAR` but with month granularity
         - `DAY`: same as `YEAR` but with day granularity
         - `HOUR`: same as `YEAR` but with hour granularity
-    - `format`: if your partition type is `YEAR`, `MONTH`, `DAY`, or `HOUR` 
specify the format for the date string as it
-      appears in your file paths
-3. The default implementations of table format clients can be replaced with 
custom implementations by specifying a
-   client configs yaml file in the format below:
+    - `format`: if your partition type is `YEAR`, `MONTH`, `DAY`, or `HOUR` 
specify the format for the date string as it appears in your file paths
+3. The default implementations of table format clients can be replaced with 
custom implementations by specifying a client configs yaml file in the format 
below:
 ```yaml
 # sourceClientProviderClass: The class name of a table format's client 
factory, where the client is
 #     used for reading from a table of this format. All user configurations, 
including hadoop config
@@ -85,8 +84,7 @@ tableFormatsClients:
         spark.master: local[2]
         spark.app.name: onetableclient
 ```
-4. A catalog can be used when reading and updating Iceberg tables. The catalog 
can be specified in a yaml file and
-   passed in with the `--icebergCatalogConfig` option. The format of the 
catalog config file is:
+4. A catalog can be used when reading and updating Iceberg tables. The catalog 
can be specified in a yaml file and passed in with the `--icebergCatalogConfig` 
option. The format of the catalog config file is:
 ```yaml
 catalogImpl: io.my.CatalogImpl
 catalogName: name
@@ -94,8 +92,7 @@ catalogOptions: # all other options are passed through in a 
map
   key1: value1
   key2: value2
 ```
-5. run
-   with `java -jar utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar 
--datasetConfig my_config.yaml [--hadoopConfig hdfs-site.xml] [--clientsConfig 
clients.yaml] [--icebergCatalogConfig catalog.yaml]`
+5. run with `java -jar utilities/target/utilities-0.1.0-SNAPSHOT-bundled.jar 
--datasetConfig my_config.yaml [--hadoopConfig hdfs-site.xml] [--clientsConfig 
clients.yaml] [--icebergCatalogConfig catalog.yaml]`
    The bundled jar includes hadoop dependencies for AWS, Azure, and GCP. 
Authentication for AWS is done with
    `com.amazonaws.auth.DefaultAWSCredentialsProviderChain`. To override this 
setting, specify a different implementation
    with the `--awsCredentialsProvider` option.
@@ -108,10 +105,7 @@ For setting up the repo on IntelliJ, open the project and 
change the java versio
 You have found a bug, or have a cool idea you that want to contribute to the 
project ? Please file a GitHub issue 
[here](https://github.com/onetable-io/onetable/issues)
 
 ## Adding a new target format
-Adding a new target format requires a developer
-implement 
[TargetClient](./api/src/main/java/io/onetable/spi/sync/TargetClient.java). 
Once you have implemented that
-interface, you can integrate it into the 
[OneTableClient](./core/src/main/java/io/onetable/client/OneTableClient.java).
-If you think others may find that target useful, please raise a Pull Request 
to add it to the project.
+Adding a new target format requires a developer implement 
[TargetClient](./api/src/main/java/io/onetable/spi/sync/TargetClient.java). 
Once you have implemented that interface, you can integrate it into the 
[OneTableClient](./core/src/main/java/io/onetable/client/OneTableClient.java). 
If you think others may find that target useful, please raise a Pull Request to 
add it to the project.
 
 ## Overview of the sync process
 ![img.png](assets/images/sync_flow.jpg)

Reply via email to