I found the demo setup in the "docker" directory not beginner friendly. It
took some effort to digest what's there and it's hard to play with.
Proposing some scenario-based quickstart setup

- Scenario 1: DeltaStreamer write
  - sample raw dataset, local FS
  - run deltastreamer with local Spark or Flink write to COW or MOR
- Scenario 2: meta sync
  - sample hoodie table (COW or MOR), local FS
  - run hive sync with local Hive server
- Scenario 3: SQL read
  - sample hoodie table (COW or MOR), local FS
  - run local Trino/Presto queries
- More scenarios: incremental read, clustering, etc

In all scenarios, users can choose between a release version and the local
version of Hudi.

Not meant to replace the current "docker" demo. It can be under a
"quickstart" dir and aims to be more focused quick sandbox. A typical dev
flow is
1. changed some code
2. run mvn install -DskipTests
3. play with affected scenarios to verify the change

Any thoughts or comments? Thank you.

Reply via email to