[
https://issues.apache.org/jira/browse/BAHIR-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054539#comment-16054539
]
ASF GitHub Bot commented on BAHIR-110:
--------------------------------------
GitHub user emlaver opened a pull request:
https://github.com/apache/bahir/pull/45
[BAHIR-110] Implement _changes API for non-streaming receiver
See [JIRA-110](https://issues.apache.org/jira/browse/BAHIR-110)
_What_
Add support for _changes API for non-streaming (data frames and SQL temp.
views) receiver.
_How_
- New CloudantConfig option `apiReceiver` for selecting _all_docs and
_changes endpoint in Cloudant to Spark data frames and SQL temp tables
- Default is `_all_docs` endpoint for non-streaming receiver
- Base abstract config class that's extended by an all_docs class and
_changes class
- JsonStoreConfigManager includes new 'cloudant.apiReceiver' config option
for selecting _all_docs and _changes endpoint in Cloudant to Spark data frames
and SQL temp tables
- Updated README with details for 'cloudant.apiReceiver' option
_Testing_
- Added base class ClientSparkFunSuite for setting up, creating, and
loading sample data from flat files to test databases.
- CloudantAllDocsDFSuite to test Spark data frames using the _all_docs
endpoint.
- CloudantChangesDFSuite to test Spark data frames using the _changes
endpoint.
- CloudantOptionSuite to verify Cloudant config options.
- CloudantSparkSQLSuite to test Spark SQL temp views.
Note: 27,378 lines added for the JSON files used in the testing suite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/emlaver/bahir
110-implement-changes-api-in-receiver
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/bahir/pull/45.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #45
----
commit 065752978e7d826cd311df4c517044183db0c372
Author: Esteban Laver <[email protected]>
Date: 2017-06-16T18:17:30Z
Excluded scala flat files for testing from build
commit 751a5c7b876eca822e3047a609165505b5eadc4c
Author: Esteban Laver <[email protected]>
Date: 2017-06-16T18:23:39Z
Added MapReduce example, removed unused imports, and replaced SQL TEMP
TABLE with TEMP VIEW
commit 39be19029a5b02820749ea7a5e19f345a4d74de0
Author: Esteban Laver <[email protected]>
Date: 2017-06-19T15:22:41Z
New CloudantConfig option `apiReceiver` for selecting _all_docs and
_changes endpoint in Cloudant to Spark data frames and SQL temp tables
- Default is `_all_docs` endpoint for non-streaming receiver
- Base abstract config class that's extended by an all_docs class and
_changes class
- CloudantException thrown when required Spark Cloudant config option is
empty or invalid
- Updated scala style
commit b662611722d118800b1135ab69f02a979ebedb3c
Author: Esteban Laver <[email protected]>
Date: 2017-06-19T15:23:28Z
JsonStoreConfigManager: new 'cloudant.apiReceiver' config option for
selecting _all_docs and _changes endpoint in Cloudant to Spark data frames and
SQL temp tables
- Throw CloudantException when spark config value is invalid or empty
JsonStoreDataAccess: Added selector for use with _changes API and to filter
out design docs
JsonStoreRDD: Partition set to 1 for _changes API
Updated Scala style in common classes:
- Fixed ordering of imports
- Added type notation
- Removed redundant parenthesis
commit 4c8fc6bff81df034e789d2e782db11fca6e7cd84
Author: Esteban Laver <[email protected]>
Date: 2017-06-19T15:25:28Z
JSON files and logging properties for testing suite
commit a798f4cd1ef63f10a2f261f3c4460ef018d8d95d
Author: Esteban Laver <[email protected]>
Date: 2017-06-19T15:28:02Z
Testing suite:
ClientSparkFunSuite for setting up, creating, and loading sample data from
flat files to test databases.
CloudantAllDocsDFSuite to test Spark data frames using the _all_docs
endpoint.
CloudantChangesDFSuite to test Spark data frames using the _changes
endpoint.
CloudantOptionSuite to verify Cloudant config options.
CloudantSparkSQLSuite to test Spark SQL temp views.
- Version 2.6.7 for jackson dependencies resolves "Incompatible Jackson
version" during build
- Cloudant set-up and database creation using cloudant-client library
commit c6ecb836ef0eb714398360b21ab95f1c18b762e1
Author: Esteban Laver <[email protected]>
Date: 2017-06-19T15:28:23Z
Updated README
- New option 'cloudant.apiReceiver' for selecting _all_docs or _changes
endpoint
- Fixed links to source code files
----
> Replace use of _all_docs API with _changes API in all receivers
> ---------------------------------------------------------------
>
> Key: BAHIR-110
> URL: https://issues.apache.org/jira/browse/BAHIR-110
> Project: Bahir
> Issue Type: Improvement
> Reporter: Esteban Laver
> Original Estimate: 216h
> Remaining Estimate: 216h
>
> Today we use the _changes API for Spark streaming receiver and _all_docs API
> for non-streaming receiver. _all_docs API supports parallel reads (using
> offset and range) but performance of _changes API is still better in most
> cases (even with single threaded support).
> With this ticket we want to:
> a) re-implement all receivers using _changes API
> b) compare performance between the two implementations based on _changes and
> _all_docs
> Based on the results in b) we could decide to either
> - replace _all_docs implementation with _changes based implementation OR
> - allow customers to pick one (with a solid documentation about pros and
> cons)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)