[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 @emlaver since you are still working on this code, do you think you could take care of those deprecation warnings this time around? :-) ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 After more than 10 successful builds I'd say that the flaky `sql-cloudant` tests are more stable now than they have ever been. Good work @emlaver. Once you have confirmed that there is no conflict with the copyright for the new code I am happy to merge this. ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 retest this please ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 Uh oh, @emlaver unless someone else was running the same tests concurrently to the last Jenkins test run, there may still be some work to be done. ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 uh, maybe one more, just to be sure :-) retest this please ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 one more for good emasure retest this please ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 retest this please ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 @emlaver -- I may have influenced the test execution by running the `mvn test slq-cloudant -q` locally while the Jenkins PR builder was running the same tests (I was using the same Cloudant account as Jenkins). I just restarted the build. ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 retest this please ---
[GitHub] bahir issue #58: [BAHIR-152] Enforce License Header in Java Sources
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/58 Thanks @tedyu for review and @lresende for merging. ---
[GitHub] bahir pull request #58: [BAHIR-152] Enforce License Header in Java Sources
GitHub user ckadner opened a pull request: https://github.com/apache/bahir/pull/58 [BAHIR-152] Enforce License Header in Java Sources [BAHIR-152: License header not enforced for Java sources](https://issues.apache.org/jira/browse/BAHIR-152) Add a `Header` rule to the `checkstyle` configuration to enforce proper Apache license headers in `*.java` source files. A similar `HeaderMatchesChecker` rule already exists in the `scalastyle` configuration to enforce the license headers in `*.scala` source files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ckadner/bahir BAHIR-152_java_license_header Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir/pull/58.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #58 commit 245db8fc6b45ede6a4a2d3be9740e50a20a29a33 Author: Christian Kadner Date: 2017-12-10T11:15:04Z [BAHIR-152] Enforce License Header in Java Sources Add a "Header" rule to the checkstyle configuration to enforce proper Apache license headers in Java source files. A similar rule ("HeaderMatchesChecker") already exists in the scalastyle configuration. Closes #58 ---
[GitHub] bahir issue #58: [BAHIR-152] Enforce License Header in Java Sources
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/58 ok to test ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155936326 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java --- @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. + * + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file + * except in compliance with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the + * License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, + * either express or implied. See the License for the specific language governing permissions + * and limitations under the License. + */ +package org.apache.bahir.cloudant.common; + +import com.google.gson.JsonElement; +import com.google.gson.JsonObject; + +import java.util.List; + +/** + * Class representing a single row in a changes feed. Structure: + * + * { + * last_seq": 5 + * "results": [ + * ---*** This next items is the ChangesRow ***--- + * { + * "changes": [ {"rev": "2-eec205a9d413992850a6e32678485900"}, ... ], + * "deleted": true, + * "id": "deleted", + * "seq": 5, + * "doc": ... structure ... + * } + * ] + * } + */ +public class ChangesRow { --- End diff -- @emlaver -- Java sources should reside under `src/main/java` not `src/main/scala` unless you can convert this code to Scala (preferably) ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 Very odd that the RAT license checks succeeded. Apparently it is our `scalastyle` checks which complain about any non-conformant license headers, but those don't cover `*.java` files, so I need to update our `checkstyle` rules for Java sources. But I am still puzzled about RAT letting this pass (@lresende) ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 I just enabled the RAT check for our Jenkins PR builder. restest this please ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155934996 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRowScanner.java --- @@ -0,0 +1,92 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. + * --- End diff -- same [comment](https://github.com/apache/bahir/pull/57/commits/5e554103bee8162b85948e219dd4b7fdd7707a30#r155934979) as above ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155934979 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/common/ChangesRow.java --- @@ -0,0 +1,74 @@ +/* + * Copyright (c) 2017 IBM Cloudant. All rights reserved. --- End diff -- @emlaver -- is this an outdated copyright statement? If it is still valid you may need to check with the author and/or IBM if you can contribute this code (or variations of it) to open-source. I am surprised the RAT check did not catch this. CC @lresende ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 retest this please ---
[GitHub] bahir pull request #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/57#discussion_r155697683 --- Diff: pom.xml --- @@ -458,7 +458,7 @@ .gitignore .repository/ - .idea/ + **/.idea/** --- End diff -- this change should not be necessary. the `.idea/` folder should only get created at the project root level. any nested files and folders are already covered. maybe a mishap when setting up IntelliJ? ---
[GitHub] bahir issue #57: [BAHIR-128][WIP] fix failing sql-cloudant test
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/57 ok to test ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 Any further comments? @anntinutj - you commented on [BAHIR-104](https://issues.apache.org/jira/browse/BAHIR-104) @fbeneventi - I saw you worked on this before (https://github.com/fbeneventi/bahir/commit/3755ecc) ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 **LGTM** I ran the (new) Python tests manually and they completed successfully. **Before:** ``` [bahir] (master *=)$ streaming-mqtt/python-tests/run-python-tests.sh -- Ran 1 test in 22.871s OK ``` **After:** ``` [bahir] (pr-55_BAHIR-104_python_pairRDD *=)$ streaming-mqtt/python-tests/run-python-tests.sh -- Ran 2 tests in 27.593s OK ``` ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 @zubairnabi-intech -- we can ignore the build failure for now, but I still need to manually test your changes. ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 We still have failing tests in `sql-cloudant` but all other modules were built and tested successfully. ``` [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Bahir - Parent POM .. SUCCESS [ 4.356 s] [INFO] Apache Bahir - Spark SQL Cloudant DataSource ... FAILURE [03:13 min] [INFO] Apache Bahir - Spark Streaming Akka SUCCESS [ 27.906 s] [INFO] Apache Bahir - Spark SQL Streaming Akka SUCCESS [03:49 min] [INFO] Apache Bahir - Spark Streaming MQTT SUCCESS [01:53 min] [INFO] Apache Bahir - Spark SQL Streaming MQTT SUCCESS [02:05 min] [INFO] Apache Bahir - Spark Streaming Twitter . SUCCESS [ 25.999 s] [INFO] Apache Bahir - Spark Streaming ZeroMQ .. SUCCESS [ 18.989 s] [INFO] Apache Bahir - Spark Streaming Google PubSub ... SUCCESS [ 53.220 s] [INFO] Apache Bahir - Spark Extensions Distribution ... SUCCESS [ 3.500 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 13:16 min [INFO] Finished at: 2017-12-07T17:02:57-08:00 [INFO] Final Memory: 183M/6960M [INFO] ``` ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 retest this please ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 I changed the Jenkins build configuration to continue the build after failed modules (#56) ---
[GitHub] bahir pull request #56: [BAHIR-150] Test Jenkins PR build config changes (DO...
Github user ckadner closed the pull request at: https://github.com/apache/bahir/pull/56 ---
[GitHub] bahir issue #56: [BAHIR-150] Test Jenkins PR build config changes (DONT MERG...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/56 After adding the `--fail-at-end` flag to the PR builder's maven build, now all modules get built and tested even after the forced test failure in module `sql-cloudant`: ``` [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Bahir - Parent POM .. SUCCESS [ 4.371 s] [INFO] Apache Bahir - Spark SQL Cloudant DataSource ... FAILURE [03:44 min] [INFO] Apache Bahir - Spark Streaming Akka SUCCESS [ 28.087 s] [INFO] Apache Bahir - Spark SQL Streaming Akka SUCCESS [03:49 min] [INFO] Apache Bahir - Spark Streaming MQTT SUCCESS [01:51 min] [INFO] Apache Bahir - Spark SQL Streaming MQTT SUCCESS [02:08 min] [INFO] Apache Bahir - Spark Streaming Twitter . SUCCESS [ 25.335 s] [INFO] Apache Bahir - Spark Streaming ZeroMQ .. SUCCESS [ 19.410 s] [INFO] Apache Bahir - Spark Streaming Google PubSub ... SUCCESS [ 51.206 s] [INFO] Apache Bahir - Spark Extensions Distribution ... SUCCESS [ 3.566 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 13:47 min [INFO] Finished at: 2017-12-07T16:37:37-08:00 [INFO] Final Memory: 174M/6788M [INFO] ``` ---
[GitHub] bahir issue #56: [BAHIR-150] Test Jenkins PR build config changes (DONT MERG...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/56 retest this please ---
[GitHub] bahir issue #56: [BAHIR-150] Test Jenkins PR build config changes (DONT MERG...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/56 retest this Jenkins ---
[GitHub] bahir issue #56: Dummy test to force build failure (DONT MERGE)
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/56 The test failure above reflects the current PR builder behavior, before adding the `--fail-at-end` flag to the maven build. All modules are skipped after test failure in `sql-cloudant`: ``` [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ spark-sql-cloudant_2.11 --- --- T E S T S --- Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.19.1:test (test) @ spark-sql-cloudant_2.11 --- [INFO] Skipping execution of surefire because it has already been run for this configuration [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-cloudant_2.11 --- Discovery starting. Sql-cloudant tests that require Cloudant databases have been enabled by the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD. Discovery completed in 186 milliseconds. Run starting. Expected test count is: 23 CloudantOptionSuite: - invalid api receiver option throws an error message - empty username option throws an error message - empty password option throws an error message - empty databaseName throws an error message ClientSparkFunSuite: CloudantChangesDFSuite: - dummy test to force build failure *** FAILED *** org.scalatest.exceptions.TestFailedException was thrown. (CloudantChangesDFSuite.scala:45) - load and save data from Cloudant database - load and count data from Cloudant search index - load data and verify deleted doc is not in results - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function - load data and verify total count of selector, filter, and view option CloudantSparkSQLSuite: - verify results from temp view of database n_airportcodemapping - verify results from temp view of index in n_flight CloudantAllDocsDFSuite: - load and save data from Cloudant database - load and count data from Cloudant search index - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function Run completed in 3 minutes, 21 seconds. Total number of tests run: 23 Suites: completed 6, aborted 0 Tests: succeeded 22, failed 1, canceled 0, ignored 0, pending 0 *** 1 TEST FAILED *** [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Bahir - Parent POM .. SUCCESS [ 4.411 s] [INFO] Apache Bahir - Spark SQL Cloudant DataSource ... FAILURE [03:39 min] [INFO] Apache Bahir - Spark Streaming Akka SKIPPED [INFO] Apache Bahir - Spark SQL Streaming Akka SKIPPED [INFO] Apache Bahir - Spark Streaming MQTT SKIPPED [INFO] Apache Bahir - Spark SQL Streaming MQTT SKIPPED [INFO] Apache Bahir - Spark Streaming Twitter . SKIPPED [INFO] Apache Bahir - Spark Streaming ZeroMQ .. SKIPPED [INFO] Apache Bahir - Spark Streaming Google PubSub ... SKIPPED [INFO] Apache Bahir - Spark Extensions Distribution ... SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 03:44 min [INFO] Finished at: 2017-12-07T16:12:30-08:00 [INFO] Final Memory: 84M/2554M [INFO] ``` ---
[GitHub] bahir pull request #56: Dummy test to force build failure (DONT MERGE)
GitHub user ckadner opened a pull request: https://github.com/apache/bahir/pull/56 Dummy test to force build failure (DONT MERGE) This is a test of our Jenkins PR builder setup. **DON'T merge this PR!** You can merge this pull request into a Git repository by running: $ git pull https://github.com/ckadner/bahir patch-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir/pull/56.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #56 commit 03540109791beb799e6da76abe2ae9ef04f96def Author: Christian Kadner Date: 2017-12-08T00:07:08Z Dummy test to force build failure (DONT MERGE) ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 Actually, our Maven build does not kick of the Python tests, so we may have to test this PR "manually" for the time being. I will start on making the build changes independently. ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 The problem is that one failing test in `sql-cloudant` causes all remaining tests to be skipped. Which means this PR can't be tested. @emlaver -- If you had an actual fix, could you create a PR for which we can quickly merge? Then this PR could be rebased on that latest code. If that is not possible, I will have to look into changing the Jenkins build to run the tests separately from (and after) the compile/package build and tell Maven to keep running after test failures. ---
[GitHub] bahir issue #55: [BAHIR-104] Multi-topic MQTT DStream in Python is now a Pai...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/55 @emlaver -- could you take a look at the build failure? Thanks ``` CloudantChangesDFSuite: - load and save data from Cloudant database *** FAILED *** 0 did not equal 1967 (CloudantChangesDFSuite.scala:51) ``` ... and with a bit more log context: ``` [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-cloudant_2.11 --- Discovery starting. Sql-cloudant tests that require Cloudant databases have been enabled by the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD. Discovery completed in 187 milliseconds. Run starting. Expected test count is: 22 CloudantOptionSuite: - invalid api receiver option throws an error message - empty username option throws an error message - empty password option throws an error message - empty databaseName throws an error message ClientSparkFunSuite: CloudantChangesDFSuite: - load and save data from Cloudant database *** FAILED *** 0 did not equal 1967 (CloudantChangesDFSuite.scala:51) - load and count data from Cloudant search index - load data and verify deleted doc is not in results - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function - load data and verify total count of selector, filter, and view option CloudantSparkSQLSuite: - verify results from temp view of database n_airportcodemapping - verify results from temp view of index in n_flight CloudantAllDocsDFSuite: - load and save data from Cloudant database - load and count data from Cloudant search index - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function Run completed in 3 minutes, 8 seconds. Total number of tests run: 22 Suites: completed 6, aborted 0 Tests: succeeded 21, failed 1, canceled 0, ignored 0, pending 0 *** 1 TEST FAILED *** [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Bahir - Parent POM .. SUCCESS [ 4.355 s] [INFO] Apache Bahir - Spark SQL Cloudant DataSource ... FAILURE [06:50 min] [INFO] Apache Bahir - Spark Streaming Akka SKIPPED [INFO] Apache Bahir - Spark SQL Streaming Akka SKIPPED [INFO] Apache Bahir - Spark Streaming MQTT SKIPPED [INFO] Apache Bahir - Spark SQL Streaming MQTT SKIPPED [INFO] Apache Bahir - Spark Streaming Twitter . SKIPPED [INFO] Apache Bahir - Spark Streaming ZeroMQ .. SKIPPED [INFO] Apache Bahir - Spark Streaming Google PubSub ... SKIPPED [INFO] Apache Bahir - Spark Extensions Distribution ... SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 06:55 min [INFO] Finished at: 2017-12-05T14:33:29-08:00 [INFO] Final Memory: 67M/2606M [INFO] [ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project spark-sql-cloudant_2.11: There are test failures -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :spark-sql-cloudant_2.11 ``` ---
[GitHub] bahir issue #49: BAHIR-130
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/49 @romeokienzler -- any updates or progress? ---
[GitHub] bahir issue #51: [BAHIR-139] Force scala-maven-plugin to use java.version 1....
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/51 ok to test ---
[GitHub] bahir pull request #51: [BAHIR-139] Force scala-maven-plugin to use java.ver...
GitHub user ckadner opened a pull request: https://github.com/apache/bahir/pull/51 [BAHIR-139] Force scala-maven-plugin to use java.version Make sure the *scala-maven-plugin* uses `${java.version}` `1.8` instead of the default which is Java version `1.6`. Also upgrading the scala-maven-plugin version from `3.2.2` to `3.3.1` JIRA: [BAHIR-139](https://issues.apache.org/jira/browse/BAHIR-139) You can merge this pull request into a Git repository by running: $ git pull https://github.com/ckadner/bahir BAHIR-139_scala-maven-plugin_Java_compile_version Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir/pull/51.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #51 commit 1b3848f26956f32712b62d28b2abf0182d072c8f Author: Christian Kadner Date: 2017-10-10T23:54:36Z [BAHIR-139] Force scala-maven-plugin to use java.version Make sure the scala-maven-plugin uses java.version 1.8 instead of the default which is Java 1.6. Also upgrading the scala-maven-plugin version from 3.2.2 to 3.3.1 Closes #51 ---
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 **LGTM** ---
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 @emlaver -- Right, I had not realized these deprecation warnings got introduced by another PR prior to this. Thanks for opening a JIRA to track it. ---
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 @emlaver -- can you take care of these deprecation **`WARNING`** messages? ``` 13:10:38 [INFO] Compiling 11 Scala sources to /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/target/scala-2.11/classes... 13:10:42 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:59: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:42 [WARNING] val df = sqlContext.read.json(cloudantRDD) 13:10:42 [WARNING] ^ 13:10:42 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:115: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:42 [WARNING] dataFrame = sqlContext.read.json(cloudantRDD) 13:10:42 [WARNING] ^ 13:10:42 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:121: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:42 [WARNING] sqlContext.read.json(aRDD) 13:10:42 [WARNING] ^ 13:10:42 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala:152: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:42 [WARNING] dataFrame = sqlContext.sparkSession.read.json(globalRDD) 13:10:42 [WARNING]^ 13:10:45 [WARNING] four warnings found ``` ``` 13:10:46 [INFO] Compiling 11 Scala sources to /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/target/scala-2.11/test-classes... 13:10:49 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:46: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:49 [WARNING] val changesDataFrame = spark.read.json(rdd) 13:10:49 [WARNING] ^ 13:10:49 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreaming.scala:67: method registerTempTable in class Dataset is deprecated: Use createOrReplaceTempView(viewName) instead. 13:10:49 [WARNING] changesDataFrame.registerTempTable("airportcodemapping") 13:10:49 [WARNING]^ 13:10:49 [WARNING] /var/lib/jenkins/workspace/bahir_spark_pr_builder/sql-cloudant/examples/src/main/scala/org/apache/spark/examples/sql/cloudant/CloudantStreamingSelector.scala:50: method json in class DataFrameReader is deprecated: Use json(Dataset[String]) instead. 13:10:49 [WARNING] val changesDataFrame = spark.read.json(rdd) 13:10:49 [WARNING] ^ 13:10:52 [WARNING] three warnings found ``` ---
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 @emlaver -- one down, one to go? :smile: > ~`CloudantChangesDFSuite`:~ > ~`- save dataframe to database using createDBOnSave=true option *** FAILED ***`~ > `CloudantAllDocsDFSuite`: > `- save dataframe to database using createDBOnSave=true option *** FAILED ***` ---
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 @emlaver > `CloudantChangesDFSuite`: > `- save dataframe to database using createDBOnSave=true option FAILED` **Test failures:** ``` 14:10:05 [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-cloudant_2.11 --- 14:10:05 Discovery starting. 14:10:05 14:10:05 Sql-cloudant tests that require Cloudant databases have been enabled by 14:10:05 the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD. 14:10:05 14:10:05 Discovery completed in 187 milliseconds. 14:10:05 Run starting. Expected test count is: 22 14:10:05 CloudantOptionSuite: 14:10:09 - invalid api receiver option throws an error message 14:10:09 - empty username option throws an error message 14:10:09 - empty password option throws an error message 14:10:10 - empty databaseName throws an error message 14:10:10 ClientSparkFunSuite: 14:10:10 CloudantChangesDFSuite: 14:10:34 - load and save data from Cloudant database 14:10:36 - load and count data from Cloudant search index 14:10:52 - load data and verify deleted doc is not in results 14:11:12 - load data and count rows in filtered dataframe 14:11:52 - save filtered dataframe to database 14:12:12 - save dataframe to database using createDBOnSave=true option *** FAILED *** 14:12:12 org.apache.bahir.cloudant.common.CloudantException: Database airportcodemapping_df create error: {"error":"file_exists","reason":"The database could not be created, the file already exists."} 14:12:12 at org.apache.bahir.cloudant.common.JsonStoreDataAccess.createDB(JsonStoreDataAccess.scala:143) 14:12:12 at org.apache.bahir.cloudant.CloudantReadWriteRelation.insert(DefaultSource.scala:72) 14:12:12 at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:172) 14:12:12 at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:86) 14:12:12 at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) 14:12:12 at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) 14:12:12 at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) 14:12:12 at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) 14:12:12 at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) 14:12:12 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) 14:12:12 ... 14:12:13 - load and count data from view 14:12:13 - load data from view with MapReduce function 14:12:53 - load data and verify total count of selector, filter, and view option 14:12:53 CloudantSparkSQLSuite: 14:12:56 - verify results from temp view of database n_airportcodemapping 14:12:59 - verify results from temp view of index in n_flight 14:13:00 CloudantAllDocsDFSuite: 14:13:03 - load and save data from Cloudant database 14:13:04 - load and count data from Cloudant search index 14:13:04 - load data and count rows in filtered dataframe 14:13:06 - save filtered dataframe to database 14:13:07 - save dataframe to database using createDBOnSave=true option *** FAILED *** 14:13:07 org.apache.bahir.cloudant.common.CloudantException: Database airportcodemapping_df create error: {"error":"file_exists","reason":"The database could not be created, the file already exists."} 14:13:07 at org.apache.bahir.cloudant.common.JsonStoreDataAccess.createDB(JsonStoreDataAccess.scala:143) 14:13:07 at org.apache.bahir.cloudant.CloudantReadWriteRelation.insert(DefaultSource.scala:72) 14:13:07 at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:172) 14:13:07 at org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:86) 14:13:07 at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:472) 14:13:07 at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) 14:13:07 at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) 14:13:07 at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) 14:13:07 at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) 14:13:07 at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) 14:13:07 ... 14:13:07 - load and count data from view 14:13:07 - load data from view with MapReduce function
[GitHub] bahir issue #50: [BAHIR-123] Support latest version of play-json for sql-clo...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/50 ok to test ---
[GitHub] bahir issue #49: BAHIR-130
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/49 @romeokienzler -- are you still working on incorporating @emlaver review comments? ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 I removed the `util-hadoop` dependency and merged this PR (@bchen-talend) @ire7715 -- Thanks for your PR! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 @ire7715 -- I create a [Google API Service account](https://console.developers.google.com/iam-admin/serviceaccounts/project?project=apache-bahir-pubsub) and [added the generated key files](https://support.cloudbees.com/hc/en-us/articles/203802500-Injecting-Secrets-into-Jenkins-Build-Jobs) to our Jenkins server. All your tests appear to be [enabled and complete successfully](http://169.45.79.58:8080/job/bahir_spark_pr_builder/95/) now. ``` [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-streaming-pubsub_2.11 --- Discovery starting. Google Pub/Sub tests that actually send data has been enabled by setting the environment variable ENABLE_PUBSUB_TESTS to 1. This will create Pub/Sub Topics and Subscriptions in Google cloud platform. Please be aware that this may incur some Google cloud costs. Set the environment variable GCP_TEST_PROJECT_ID to the desired project. Discovery completed in 135 milliseconds. Run starting. Expected test count is: 10 SparkGCPCredentialsBuilderSuite: - should build application default - should build json service account - should provide json creds - should build p12 service account - should provide p12 creds - should build metadata service account - SparkGCPCredentials classes should be serializable Using project apache-bahir-pubsub for creating Pub/Sub topic and subscription for tests. PubsubStreamSuite: - PubsubUtils API - pubsub input stream - pubsub input stream, create pubsub Run completed in 14 seconds, 143 milliseconds. Total number of tests run: 10 Suites: completed 3, aborted 0 Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` --- Would you **please add a short paragraph** to the [PubSub README](https://github.com/apache/bahir/blob/master/streaming-pubsub/README.md) describing how to enable your unit tests by setting the environment variables (and how to set up a Google API *service account*, generate *key files* and how to minimally configure the *Roles* like "Pub/Sub Publisher", etc)? i.e.: ```Bash mvn clean package -DskipTests -pl streaming-pubsub export ENABLE_PUBSUB_TESTS=1 export GCP_TEST_ACCOUNT="apache-bahir-streaming-pub...@apache-bahir-pubsub.iam.gserviceaccount.com" export GCP_TEST_PROJECT_ID="apache-bahir-pubsub" export GCP_TEST_JSON_KEY_PATH=/path/to/pubsub/credential/files/Apache-Bahir-PubSub-1234abcd.json export GCP_TEST_P12_KEY_PATH=/path/to/pubsub/credential/files/Apache-Bahir-PubSub-5678efgh.p12 mvn test -pl streaming-pubsub ``` **Thank you!** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 thanks @ire7715 -- I have a few remarks regarding your latest comment: --- > Don't know if the force push would bother you when reviewing Thanks for not force-pushing :+1: -- It's preferable to have multiple commits in response to PR review comments and change requests. This makes it much easier to come back later to see how code changes came about. Bahir committers will squash all commits when merging Pull Requests. So, please push another "normal" commit with your latest changes. --- > `SparkGCPCredentialsBuilderSuite` ... ignores the test cases if the key files or email account [environment variables] are not set (or file doesn't exist) and shows the hint message I agree mostly. We should ignore the test cases if env variables are not set. However, if the environment variables **are set** and the key file **path is invalid** then that should be an **error**. Otherwise we may not catch problems if there are changes in the Jenkins CI server. Could you generate a set of (permanent) key files which we can integrate into our Jenkins PR builder? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 Thanks @ire7715 for your fixes. Re: key file ([comment, July 21](https://github.com/apache/bahir/pull/48#discussion_r128909173)) > **ckadner:** are there no risks with making this key-file public? > **ire7715 :** Yes, it is okay. The key was generated as a dummy IAM service account, which now have been removed. And I have interchanged part of the private key bytes, which makes it unusable. So, the key file is unusable for the unit test runs? If so, then there would be no reason to adding it as a test resource, no? Is the idea then to communicate to developers how/where to add a key file they would have to generate for themselves? Would it be better then to have the unit test display a warning message in the console output if the key file is missing and skip the impacted test case(s)? For the Jenkins CI server, we would have to install a key file that does work (keeps working) and in a pre-build step copy it from somewhere, or use an environment variable to point to a local directory that has the key file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #45: [BAHIR-110] Implement _changes API for non-streaming receiv...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/45 Thanks @emlaver for this PR and @mayya-sharipova for your thorough review! **LGTM** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 @ire7715 ``` [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ spark-streaming-pubsub_2.11 --- ... [INFO] Compiling 3 Scala sources to /var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-pubsub/target/scala-2.11/classes... [ERROR] /var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-pubsub/src/main/scala/org/apache/spark/streaming/pubsub/SparkGCPCredentials.scala:20: object jackson is not a member of package com.google.api.client.json [ERROR] import com.google.api.client.json.jackson.JacksonFactory [ERROR] ^ [ERROR] /var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-pubsub/src/main/scala/org/apache/spark/streaming/pubsub/SparkGCPCredentials.scala:74: not found: type JacksonFactory [ERROR] val jsonFactory = new JacksonFactory [ERROR] ^ [ERROR] two errors found ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/48#discussion_r128862044 --- Diff: streaming-pubsub/src/main/scala/org/apache/spark/streaming/pubsub/SparkGCPCredentials.scala --- @@ -17,10 +17,13 @@ package org.apache.spark.streaming.pubsub +import com.google.api.client.json.jackson.JacksonFactory --- End diff -- this dependency may have to be added to `streaming-pubsub/pom.xml` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/48#discussion_r128861101 --- Diff: streaming-pubsub/src/test/resources/org/apache/spark/streaming/pubusb/key-file.json --- @@ -0,0 +1,12 @@ +{ + "type": "service_account", + "project_id": "apache-bahir-streaming-pubusb", + "private_key_id": "**this-is-fake-key-id***", + "private_key": "-BEGIN PRIVATE KEY-\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQD6c9MDG3gq3d+3\nV6AqayUNWlC/T5Qrd3YJOItNgDxZ0bAl9FakePrivateKey1dX44uR4FomugRX3s\nENwGRcEndczGcGivTfFEB8ZeEokBQWfuWoQkJXSPaJ1rYca3l//caWxBJ+DqBw67\nF9vJqyJ23Z/kFtQOdB3+5AwfJ0b8Jq5mkQF9FL6843mHjep2LhVTcKbjJBz0K+cS\nUDr4MEoxsc0jvIDf3EwbeGWPayRzB6d558eVa+OrcCKpTxGvBJmhzsI2Ol2EcypA\nIDOFZ7OkobWdxDYhM9vUCPUNKmMs0doR9Hola8XO92D2Y4q9BoCuU+hoDPEVVQOd\nOlKCuernAgMBAAECggEAe1/rJrC1dYhu2EZWJA875WQEOvncp7zlbI1qMfdlw2lE\nOK5gmcF3zIbhuKefsH38e7zVSTlFg2I4Mb3sZTqfd+zTvz1IlHL00upxkY3X58Js\njEISriu1S5/hTDCST4aVB+L27PHUHfT0EL4kCyg+hgeO6DFGrQgObq2wOviCQ1th\nPZGccIrvAXMwGA+6OaUpnPpBbXnZKarYTGLGjoVD2eLPx+viLRKl2AW9PChdkk/0\nZvHeL7bxbYHyktK8Vp4gHStBV421HkRNlvt5S31ju75P9ReHsxCpLt1OnhBHa/gD\nimlm5fWrSHoFHx2Q+zYVt/BhmWH/Dzq1Rd+e5/vkwQKBgQD9fj6SpOYnrgNLXa25\ny4p0VHXAHweH3fpqfsSJLLuc0TWDhEtrDhVTmX35N1J5J7GIWxMGFiQxlkj+6vm6\ncOfLSUYO++HOhWdIvNzRBUJ1NSa5oIfJITAH9vPYvmrdmr5+CNZAM1KsmV7CRhvJ\nScMTVjV0gqSFKEr3QKyCLw h3PwKBgQD87eHAwZp34DNNYWqPTb2Um9xegWnT5KYh\ntX3nxPRzyGfpPYeGedjWOwb5ST1KT0HlNhAPev02J6ZUhTrMjwHCnZcUlNiqDWdT\nlACNO810B98fO7GejjTEa6MqfaMG2m4UDA93hDBeuCOhHzXVfXvxLpUx0ABJR5Tg\nTMhkQ+AKWQKBgHYAysghEzLtgoMW/MQ8yBsXJillSHArGWNx17OzqzJ5AVxTvXf8\nelkMXuQgqLfVjoNXQifXLsoWl6xzXgU4ge7UEVTwVFF7MHVf1btHo4REVd6bqBos\n5NsQTrtbCQxX+M1a98GzIo1OaBov4Md3GuRpgUDXgBashxlKdgO0OVCpAoGANra6\n7Di1UpNEZcvaAk/938TroeH646SFr6sUJmv7uYQzvkfaJmP7XTR9qLWINaf5iDzu\nsnqXhfyDxargclnJNrFiekhMqlSl8nWEvQifxCbjxFzkank2vvrN3CY7ewMLZvjI\n68FFuem5g2Q+AAXaJu09xv3I4hFDClZxzkeY01kCgYEAv9a4vgpvGMHnjMEfq3Ym\ncbQIFq1l3djh4YqOy92EM0xr3nb1DEIvMshfhby5rwhejZ8j8m/lt/5t6uHd90/y\n60UcuPgJa2MgnPIIOZyQGH3C88o25WF9yvUAItbUtl9fxgJYdi/d9Hj821sZbhmF\nyZltoUeUMYMS4QW2OM6Dydk=\n-END PRIVATE KEY-\n", + "client_email": "pubsub-subscri...@apache-bahir-streaming-pubsub.iam.gserviceaccount.com", --- End diff -- are there no risks with making this `key-file` public? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 @bchen-talend -- can you take a look at this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #48: [BAHIR-122] [PubSub] Make "ServiceAccountCredentials" reall...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/48 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #47: [BAHIR-100] Implement new function to pass byte arra...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/47#discussion_r128432486 --- Diff: streaming-mqtt/README.md --- @@ -52,12 +52,14 @@ this actor can be configured to handle failures, etc. val lines = MQTTUtils.createStream(ssc, brokerUrl, topic) val lines = MQTTUtils.createPairedStream(ssc, brokerUrl, topic) +val lines = MQTTUtils.createPairedByteArrayStreamStream(ssc, brokerUrl, topic) --- End diff -- thanks @lresende --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #47: [BAHIR-100] Implement new function to pass byte arra...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/47#discussion_r128401599 --- Diff: streaming-mqtt/README.md --- @@ -52,12 +52,14 @@ this actor can be configured to handle failures, etc. val lines = MQTTUtils.createStream(ssc, brokerUrl, topic) val lines = MQTTUtils.createPairedStream(ssc, brokerUrl, topic) +val lines = MQTTUtils.createPairedByteArrayStreamStream(ssc, brokerUrl, topic) --- End diff -- @davidrosenstark -- I assume the `StreamStream` word duplication is a copy-paste error? ~`val lines = MQTTUtils.createPairedByteArrayStreamStream(ssc, brokerUrl, topic)`~ `val lines = MQTTUtils.createPairedByteArrayStream(ssc, brokerUrl, topic)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #45: [WIP] [BAHIR-110] Implement _changes API for non-streaming ...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/45 @emlaver -- the most recent build ([77](http://169.45.79.58:8080/job/bahir_spark_pr_builder/77/)) ran with user/pwd env vars set ... ``` Discovery completed in 185 milliseconds. Run starting. Expected test count is: 21 CloudantOptionSuite: - invalid api receiver option throws an error message - empty username option throws an error message - empty password option throws an error message - empty databaseName throws an error message ClientSparkFunSuite: CloudantChangesDFSuite: - load and save data from Cloudant database - load and count data from Cloudant search index - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function - load data and verify total count of selector, filter, and view option CloudantSparkSQLSuite: - verify results from temp view of database n_airportcodemapping - verify results from temp view of index in n_flight CloudantAllDocsDFSuite: - load and save data from Cloudant database - load and count data from Cloudant search index - load data and count rows in filtered dataframe - save filtered dataframe to database - save dataframe to database using createDBOnSave=true option - load and count data from view - load data from view with MapReduce function Run completed in 2 minutes, 58 seconds. Total number of tests run: 21 Suites: completed 6, aborted 0 Tests: succeeded 21, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #45: [WIP] [BAHIR-110] Implement _changes API for non-str...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/45#discussion_r125737077 --- Diff: sql-cloudant/README.md --- @@ -52,39 +51,61 @@ Here each subsequent configuration overrides the previous one. Thus, configurati ### Configuration in application.conf -Default values are defined in [here](cloudant-spark-sql/src/main/resources/application.conf). +Default values are defined in [here](src/main/resources/application.conf). ### Configuration on SparkConf Name | Default | Meaning --- |:---:| --- +cloudant.apiReceiver|"_all_docs"| API endpoint for RelationProvider when loading or saving data from Cloudant to DataFrames or SQL temporary tables. Select between "_all_docs" or "_changes" endpoint. cloudant.protocol|https|protocol to use to transfer data: http or https -cloudant.host||cloudant host url -cloudant.username||cloudant userid -cloudant.password||cloudant password +cloudant.host| |cloudant host url +cloudant.username| |cloudant userid +cloudant.password| |cloudant password cloudant.useQuery|false|By default, _all_docs endpoint is used if configuration 'view' and 'index' (see below) are not set. When useQuery is enabled, _find endpoint will be used in place of _all_docs when query condition is not on primary key field (_id), so that query predicates may be driven into datastore. cloudant.queryLimit|25|The maximum number of results returned when querying the _find endpoint. jsonstore.rdd.partitions|10|the number of partitions intent used to drive JsonStoreRDD loading query result in parallel. The actual number is calculated based on total rows returned and satisfying maxInPartition and minInPartition jsonstore.rdd.maxInPartition|-1|the max rows in a partition. -1 means unlimited jsonstore.rdd.minInPartition|10|the min rows in a partition. jsonstore.rdd.requestTimeout|90| the request timeout in milliseconds bulkSize|200| the bulk save size -schemaSampleSize| "-1" | the sample size for RDD schema discovery. 1 means we are using only first document for schema discovery; -1 means all documents; 0 will be treated as 1; any number N means min(N, total) docs -createDBOnSave|"false"| whether to create a new database during save operation. If false, a database should already exist. If true, a new database will be created. If true, and a database with a provided name already exists, an error will be raised. +schemaSampleSize|-1| the sample size for RDD schema discovery. 1 means we are using only first document for schema discovery; -1 means all documents; 0 will be treated as 1; any number N means min(N, total) docs +createDBOnSave|false| whether to create a new database during save operation. If false, a database should already exist. If true, a new database will be created. If true, and a database with a provided name already exists, an error will be raised. + +The `cloudant.apiReceiver` option allows for _changes or _all_docs API endpoint to be called while loading Cloudant data into Spark DataFrames or SQL Tables, +or saving data from DataFrames or SQL Tables to a Cloudant database. + +**Note:** When using `_changes` API, please consider: +1. Results are partially ordered and may not be be presented in order in +which documents were updated. +2. In case of shards' unavailability, you may see duplicate results (changes that have been seen already) +3. Can use `selector` option to retrieve all revisions for docs +4. Only supports single threaded + +When using `_all_docs` API: +1. Supports parallel reads (using offset and range) + +Performance of `_changes` API is still better in most cases (even with single threaded support). +During several performance tests using 50 to 200 MB Cloudant databases, load time from Cloudant to Spark using `_changes` +feed was faster to complete every time compared to `_all_docs`. + --- End diff -- the code style guide (enforced by build) is 100 characters per line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #45: [WIP] [BAHIR-110] Implement _changes API for non-str...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/45#discussion_r125737084 --- Diff: sql-cloudant/src/main/scala/org/apache/bahir/cloudant/CloudantAllDocsConfig.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.bahir.cloudant + +import org.apache.bahir.cloudant.common.JsonStoreConfigManager + +class CloudantAllDocsConfig(protocol: String, host: String, dbName: String, +indexName: String = null, viewName: String = null) + (username: String, password: String, partitions: Int, +maxInPartition: Int, minInPartition: Int, requestTimeout: Long, +bulkSize: Int, schemaSampleSize: Int, +createDBOnSave: Boolean, apiReceiver: String, selector: String, --- End diff -- @mayya-sharipova -- code style guide calls for 100 character per line limit See: https://github.com/apache/bahir/blob/fd4c35fc9f7ebb57464d231cf5d66e7bc4096a1b/scalastyle-config.xml#L78 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #45: [BAHIR-110] Implement _changes API for non-streaming receiv...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/45 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #45: [BAHIR-110] Implement _changes API for non-streaming receiv...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/45 thanks @emlaver -- lets kick off a test build without the environment variables set --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #45: [BAHIR-110] Implement _changes API for non-streaming...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/45#discussion_r122793307 --- Diff: sql-cloudant/src/test/scala/org/apache/bahir/cloudant/TestUtils.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.bahir.cloudant + +import java.io.File + +object TestUtils { + // List of test databases to create from JSON flat files + val testDatabasesList: List[String] = List( +"n_airportcodemapping", +"n_booking", +"n_customer", +"n_customersession", +"n_flight", +"n_flight2", +"n_flightsegment" + ) + + // Set CouchDB/Cloudant host, username and password for local testing + private val host = System.getenv("DB_HOST") --- End diff -- @emlaver -- could you prefix the environment variables with **CLOUDANT_** ? ``` CLOUDANT_DB_HOST CLOUDANT_DB_USER CLOUDANT_DB_PASSWORD CLOUDANT_DB_PROTOCOL ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #43: [BAHIR-117] Expand filtering options for TwitterInputDStrea...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/43 LGTM -- Thanks for adding the example. I agree on [BAHIR-65](https://issues.apache.org/jira/browse/BAHIR-65) warrants a separate pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #43: [BAHIR-117] Expand filtering options for TwitterInputDStrea...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/43 @c-w I think this is a good enhancement! Could you also add a Scala example that utilizes the new `FilterQuery` parameter? And ideally add a corresponding unit test (see [BAHIR-65](https://issues.apache.org/jira/browse/BAHIR-65)). Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #43: [BAHIR-117] Expand filtering options for TwitterInputDStrea...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/43 ``` 11:06:46 [INFO] --- scalastyle-maven-plugin:0.8.0:check (default-cli) @ spark-streaming-twitter_2.11 --- 11:06:47 error file=/var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala message=Use Javadoc style indentation for multiline comments line=29 column=0 11:06:47 error file=/var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterUtils.scala message=Use Javadoc style indentation for multiline comments line=156 column=0 11:06:47 Saving to outputFile=/var/lib/jenkins/workspace/bahir_spark_pr_builder/streaming-twitter/target/scalastyle-output.xml 11:06:47 Processed 8 file(s) 11:06:47 Found 2 errors 11:06:47 Found 0 warnings 11:06:47 Found 0 infos ... 11:06:47 [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (default-cli) on project spark-streaming-twitter_2.11: Failed during scalastyle execution: You have 2 Scalastyle violation(s). -> [Help 1] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #43: [BAHIR-117] Expand filtering options for TwitterInputDStrea...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/43 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink issue #7: [BAHIR-72][bahir-flink] support netty: pushed tcp/http...
Github user ckadner commented on the issue: https://github.com/apache/bahir-flink/pull/7 Sadly we did not have *Scalatest* enabled at the time this PR was reviewed, so we missed adding automated unit tests. I opened [BAHIR-113: Flink Netty connector missing automated unit tests](https://issues.apache.org/jira/browse/BAHIR-113) to keep track of that. @shijinkui -- would you be willing to take that on and open a PR for [BAHIR-113](https://issues.apache.org/jira/browse/BAHIR-113)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink issue #16: [BAHIR-112] Build Scala, enable Scalatest and Scalast...
Github user ckadner commented on the issue: https://github.com/apache/bahir-flink/pull/16 @lresende -- the license header is verified/enforced by Scalastyle rule ```xml ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink issue #16: [BAHIR-112] Build Scala, enable Scalatest and Scalast...
Github user ckadner commented on the issue: https://github.com/apache/bahir-flink/pull/16 Second commit is to address **Scalastyle** check violations: ```Bash mvn scalastyle:check -pl flink-connector-netty 2>&1 | grep "error file" | sed "s|$(pwd)||g" ``` ``` error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/HttpReceiverSource.scala message=File must end with newline character error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/NettyUtil.scala message=Header does not match expected text line=2 error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/TcpHandler.scala message=Header does not match expected text line=2 error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/TcpReceiverSource.scala message=Header does not match expected text line=2 error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/TcpReceiverSource.scala message=File must end with newline character error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/TcpServer.scala message=Header does not match expected text line=2 error file=/flink-connector-netty/src/main/scala/org/apache/flink/streaming/connectors/netty/example/TcpServer.scala message=File must end with newline character error file=/flink-connector-netty/src/test/scala/org/apache/flink/streaming/connectors/netty/example/StreamSqlExample.scala message=Header does not match expected text line=2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink pull request #16: [BAHIR-112] Build Scala, enable Scalatest and ...
GitHub user ckadner opened a pull request: https://github.com/apache/bahir-flink/pull/16 [BAHIR-112] Build Scala, enable Scalatest and Scalastyle Issue link: [BAHIR-112: Maven reports "No sources to compile" in flink-connector-netty](https://issues.apache.org/jira/browse/BAHIR-112) The Maven build for *Bahir-Flink* is only set up for **Java** sources currently. However the module `flink-connector-netty` is written in **Scala**, so none of that code is being compiled, tested or verified. This PR adds the **Scala, Scalatest, Scalastyle** maven plugins to the root `pom.xml` and modifies some of the Scala sources to comply with the enforced coding style. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ckadner/bahir-flink BAHIR-112_build_Scala_sources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir-flink/pull/16.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16 commit 38c0c1ac386c56216d8c59343a7e4cf951601414 Author: Christian Kadner Date: 2017-04-26T20:30:56Z [BAHIR-112] Build Scala sources and enable Scalatest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink issue #15: [BAHIR-111] Correcting imports of o.a.f.table.api.*
Github user ckadner commented on the issue: https://github.com/apache/bahir-flink/pull/15 @shijinkui -- since you contributed the `StreamSqlExample` could you please verify this fix? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink pull request #15: [BAHIR-111] Correcting imports of o.a.f.table....
GitHub user ckadner opened a pull request: https://github.com/apache/bahir-flink/pull/15 [BAHIR-111] Correcting imports of o.a.f.table.api.* Issue link: [BAHIR-111: IntelliJ reports compilation error for flink-connector-netty](https://issues.apache.org/jira/browse/BAHIR-111) This change addresses the following compilation error in IntelliJ: ``` Information:4/24/17, 4:25 PM - Compilation completed with 3 errors and 1 warning in 4s 337ms Warning:scalac: there was one feature warning; re-run with -feature for details .../flink-connector-netty/src/test/scala/org/apache/flink/streaming/connectors/netty/example/StreamSqlExample.scala Error:Error:line (22)object table is not a member of package org.apache.flink.api.scala import org.apache.flink.api.scala.table._ Error:Error:line (23)object table is not a member of package org.apache.flink.api import org.apache.flink.api.table.TableEnvironment Error:Error:line (45)not found: value TableEnvironment val tEnv = TableEnvironment.getTableEnvironment(env) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/ckadner/bahir-flink patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/bahir-flink/pull/15.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15 commit 68d22c4cfd694cfbb1f4c80d5f1c4d0a4ae39521 Author: Christian Kadner Date: 2017-04-25T05:51:48Z [BAHIR-111] Correcting imports of o.a.f.table.api.* --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #28: [BAHIR-75] [WIP] Remote HDFS connector for Apache Spark usi...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/28 @sourav-mazumder -- do you have any updates on the progress of this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 Thanks @lresende and @sbcd90 - LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 @sbcd90 -- thanks for you continuous updates and sorry for the piecemeal review from my end ... I started with your test cases since we were still in the process of fixing our Jenkins build setup. But your test cases are great now :-) Perhaps more important, my first request to you should have been to add a README and examples so users can start using your connector without having to read through too much code ... i.e. please further follow the precedence set by `sql-streaming-mqtt`. **Thank you!** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #38: [BAHIR-97] Akka as a streaming source for SQL Stream...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/38#discussion_r109263538 --- Diff: sql-streaming-akka/src/test/resources/feeder_actor.conf --- @@ -0,0 +1,34 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +akka { + loglevel = "INFO" + actor { +provider = "akka.remote.RemoteActorRefProvider" + } + remote { +enabled-transports = ["akka.remote.netty.tcp"] +netty.tcp { + hostname = "127.0.0.1" + port = 0 +} +log-sent-messages = on +log-received-messages = on + } + loggers.0 = "akka.event.slf4j.Slf4jLogger" + log-dead-letters-during-shutdown = "off" +} --- End diff -- add new line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #38: [BAHIR-97] Akka as a streaming source for SQL Stream...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/38#discussion_r109263328 --- Diff: sql-streaming-akka/src/main/assembly/assembly.xml --- @@ -0,0 +1,44 @@ + + +test-jar-with-dependencies + +jar + +false + + + + ${project.build.directory}/scala-${scala.binary.version}/test-classes + + + + + + +true +test +true + +org.apache.hadoop:*:jar +org.apache.zookeeper:*:jar +org.apache.avro:*:jar + + + + + --- End diff -- add new line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #38: [BAHIR-97] Akka as a streaming source for SQL Stream...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/38#discussion_r109263376 --- Diff: sql-streaming-akka/pom.xml --- @@ -0,0 +1,120 @@ + + + +http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> +4.0.0 + +org.apache.bahir +bahir-parent_2.11 +2.2.0-SNAPSHOT +../pom.xml + + +org.apache.bahir +spark-sql-streaming-akka_2.11 + +sql-streaming-akka + +jar +Apache Bahir - Spark SQL Streaming Akka +http://bahir.apache.org + + + +org.apache.spark +spark-tags_${scala.binary.version} + + +org.apache.spark +spark-sql_${scala.binary.version} +${spark.version} + + +org.apache.spark +spark-sql_${scala.binary.version} +${spark.version} +test-jar +test + + +org.apache.spark +spark-core_${scala.binary.version} +${spark.version} +test-jar +test + + +${akka.group} +akka-actor_${scala.binary.version} +${akka.version} + + +${akka.group} +akka-remote_${scala.binary.version} +${akka.version} + + +${akka.group} +akka-slf4j_${scala.binary.version} +${akka.version} + + +org.rocksdb +rocksdbjni +5.1.2 + + + + + target/scala-${scala.binary.version}/classes + target/scala-${scala.binary.version}/test-classes + + + +org.apache.maven.plugins +maven-source-plugin + + + + +org.apache.maven.plugins +maven-assembly-plugin + + +test-jar-with-dependencies +package + +single + + + + spark-streaming-akka-test-${project.version} + ${project.build.directory}/scala-${scala.binary.version} +false + +false + + src/main/assembly/assembly.xml + + + + + + + + --- End diff -- add new line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 @sbcd90 -- Thanks, much better! To remove the remaining noise, I believe you would have to add a dependency to `akka-slf4j` in `sql-streaming-akka/pom.xml` and configure Akka to use the `akka.event.slf4j.Slf4jLogger` ... ```XML ${akka.group} akka-slf4j_${scala.binary.version} ${akka.version} ``` ```Scala akka.loggers.0 = "akka.event.slf4j.Slf4jLogger" akka.log-dead-letters-during-shutdown = "off" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 @sbcd90 -- your test cases ran fine, just with a lot of "noise". http://169.45.79.58:8080/job/bahir_spark_pr_builder/38/consoleFull ``` 00:27:23 [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-streaming-akka_2.11 --- ... 00:27:24 Discovery starting. 00:27:24 log4j:ERROR Could not read configuration file from URL [file:src/test/resources/log4j.properties]. 00:27:24 java.io.FileNotFoundException: src/test/resources/log4j.properties (No such file or directory) ... 00:27:24 log4j:ERROR Ignoring configuration file [file:src/test/resources/log4j.properties]. ... 00:27:24 Discovery completed in 698 milliseconds. 00:27:24 Run starting. Expected test count is: 5 00:27:24 StressTestAkkaSource: 00:27:24 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties ... 00:28:23 Run completed in 59 seconds, 534 milliseconds. 00:28:23 Total number of tests run: 5 00:28:23 Suites: completed 4, aborted 0 00:28:23 Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0 00:28:23 All tests passed. ``` Could you add a `log4.j.properties` file in the test source folder to reduce the log verbosity? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #39: [BAHIR-101] Initial code of SparkSQL for Cloudant
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/39 @yanglei99 -- Thank you for your PR. Please also include test case(s), example(s) and a README in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #39: [BAHIR-101] Initial code of SparkSQL for Cloudant
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/39 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 @sbcd90 -- can you change your test suite to chose the `akka.remote.netty.tcp.port` dynamically? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #30: [MINOR] update ImportOrderChecker
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/30 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 @sbcd90 -- Scalatests should be sufficient. We need to fix our Jenkins integration test setup. Not an action item for you :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 Note, our Jenkins build server does not currently run Scalatests ... > 17:20:55 No tests were executed. ``` 17:20:54 [INFO] 17:20:54 [INFO] --- maven-surefire-plugin:2.19.1:test (default-test) @ spark-sql-streaming-akka_2.11 --- 17:20:54 17:20:54 --- 17:20:54 T E S T S 17:20:54 --- 17:20:54 OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 17:20:54 17:20:54 Results : 17:20:54 17:20:54 Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 17:20:54 17:20:54 [INFO] 17:20:54 [INFO] --- maven-surefire-plugin:2.19.1:test (test) @ spark-sql-streaming-akka_2.11 --- 17:20:54 [INFO] Skipping execution of surefire because it has already been run for this configuration 17:20:54 [INFO] 17:20:54 [INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-streaming-akka_2.11 --- 17:20:54 OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0 17:20:55 Discovery starting. 17:20:55 Discovery completed in 36 milliseconds. 17:20:55 Run starting. Expected test count is: 0 17:20:55 DiscoverySuite: 17:20:55 Run completed in 100 milliseconds. 17:20:55 Total number of tests run: 0 17:20:55 Suites: completed 1, aborted 0 17:20:55 Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0 17:20:55 No tests were executed. 17:20:55 [INFO] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 http://169.45.79.58:8080/job/Apache%20Bahir%20-%20Pull%20Request%20Builder/35/console --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #38: [BAHIR-97] Akka as a streaming source for SQL Streaming.
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/38 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #37: [BAHIR-89] Multi topic support API for streaming MQTT
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/37 @anntinutj -- Thank you for adding a test case. This looks good to me. Two thing to be added are a Python test and an example, but we could create a separate JIRA for that. @fbeneventi -- Did you get a chance to check out this PR? we'd appreciate your comments since you seem to have a real use case for this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #37: [BAHIR-89] Multi topic support API for streaming MQT...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/37#discussion_r104118867 --- Diff: streaming-mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTUtils.scala --- @@ -199,7 +199,181 @@ object MQTTUtils { createStream(jssc.ssc, brokerUrl, topic, StorageLevel.MEMORY_AND_DISK_SER_2, Option(clientId), Option(username), Option(password), Option(cleanSession), None, None, None, None) } + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param ssc StreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topicsArray of topic names to subscribe to + * @param storageLevel RDD storage level. Defaults to StorageLevel.MEMORY_AND_DISK_SER_2. + */ + def createPairedStream( + ssc: StreamingContext, + brokerUrl: String, + topics: Array[String], + storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2 +): ReceiverInputDStream[(String, String)] = { +new MQTTPairedInputDStream(ssc, brokerUrl, topics, storageLevel) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param sscStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topics Array of topic names to subscribe to + * @param storageLevel RDD storage level. Defaults to StorageLevel.MEMORY_AND_DISK_SER_2. + * @param clientId ClientId to use for the mqtt connection + * @param username Username for authentication to the mqtt publisher + * @param password Password for authentication to the mqtt publisher + * @param cleanSession Sets the mqtt cleanSession parameter + * @param qosQuality of service to use for the topic subscription + * @param connectionTimeout Connection timeout for the mqtt connection + * @param keepAliveInterval Keepalive interal for the mqtt connection + * @param mqttVersionVersion to use for the mqtt connection + */ + def createPairedStream( + ssc: StreamingContext, + brokerUrl: String, + topics: Array[String], + storageLevel: StorageLevel, + clientId: Option[String], + username: Option[String], + password: Option[String], + cleanSession: Option[Boolean], + qos: Option[Int], + connectionTimeout: Option[Int], + keepAliveInterval: Option[Int], + mqttVersion: Option[Int] +): ReceiverInputDStream[(String, String)] = { +new MQTTPairedInputDStream(ssc, brokerUrl, topics, storageLevel, clientId, username, password, + cleanSession, qos, connectionTimeout, keepAliveInterval, mqttVersion) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2. + * @param jssc JavaStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topic Array of topic names to subscribe to + */ + def createPairedStream( + jssc: JavaStreamingContext, + brokerUrl: String, + topics: Array[String] +): JavaReceiverInputDStream[(String, String)] = { +implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[String]] +createPairedStream(jssc.ssc, brokerUrl, topics) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param jssc JavaStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topicsArray of topic names to subscribe to + * @param storageLevel RDD storage level. + */ + def createPairedStream( + jssc: JavaStreamingContext, + brokerUrl: String, + topics: Array[String], + storageLevel: StorageLevel +): JavaReceiverInputDStream[(String, String)] = { +implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[String]] +createPairedStream(jssc.ssc, brokerUrl, topics, storageLevel) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param jssc JavaStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topic Array of topic names to subscribe to --- End diff -- should be @param topics (plural) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if
[GitHub] bahir pull request #37: [BAHIR-89] Multi topic support API for streaming MQT...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/37#discussion_r104118965 --- Diff: streaming-mqtt/.gitignore --- @@ -0,0 +1 @@ +/bin/ --- End diff -- why do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir pull request #37: [BAHIR-89] Multi topic support API for streaming MQT...
Github user ckadner commented on a diff in the pull request: https://github.com/apache/bahir/pull/37#discussion_r104118613 --- Diff: streaming-mqtt/src/main/scala/org/apache/spark/streaming/mqtt/MQTTUtils.scala --- @@ -199,7 +199,181 @@ object MQTTUtils { createStream(jssc.ssc, brokerUrl, topic, StorageLevel.MEMORY_AND_DISK_SER_2, Option(clientId), Option(username), Option(password), Option(cleanSession), None, None, None, None) } + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param ssc StreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topicsArray of topic names to subscribe to + * @param storageLevel RDD storage level. Defaults to StorageLevel.MEMORY_AND_DISK_SER_2. + */ + def createPairedStream( + ssc: StreamingContext, + brokerUrl: String, + topics: Array[String], + storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2 +): ReceiverInputDStream[(String, String)] = { +new MQTTPairedInputDStream(ssc, brokerUrl, topics, storageLevel) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * @param sscStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topics Array of topic names to subscribe to + * @param storageLevel RDD storage level. Defaults to StorageLevel.MEMORY_AND_DISK_SER_2. + * @param clientId ClientId to use for the mqtt connection + * @param username Username for authentication to the mqtt publisher + * @param password Password for authentication to the mqtt publisher + * @param cleanSession Sets the mqtt cleanSession parameter + * @param qosQuality of service to use for the topic subscription + * @param connectionTimeout Connection timeout for the mqtt connection + * @param keepAliveInterval Keepalive interal for the mqtt connection + * @param mqttVersionVersion to use for the mqtt connection + */ + def createPairedStream( + ssc: StreamingContext, + brokerUrl: String, + topics: Array[String], + storageLevel: StorageLevel, + clientId: Option[String], + username: Option[String], + password: Option[String], + cleanSession: Option[Boolean], + qos: Option[Int], + connectionTimeout: Option[Int], + keepAliveInterval: Option[Int], + mqttVersion: Option[Int] +): ReceiverInputDStream[(String, String)] = { +new MQTTPairedInputDStream(ssc, brokerUrl, topics, storageLevel, clientId, username, password, + cleanSession, qos, connectionTimeout, keepAliveInterval, mqttVersion) + } + + /** + * Create an input stream that receives messages pushed by a MQTT publisher. + * Storage level of the data will be the default StorageLevel.MEMORY_AND_DISK_SER_2. + * @param jssc JavaStreamingContext object + * @param brokerUrl Url of remote MQTT publisher + * @param topic Array of topic names to subscribe to --- End diff -- should be `@param topics` (plural) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir-flink issue #9: [BAHIR-85] move getCommandDescription to invoke method
Github user ckadner commented on the issue: https://github.com/apache/bahir-flink/pull/9 @atharvai @rmetzger -- are you still working on this? - should it be marked as WIP? - is it waiting for review? - otherwise close it? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #36: Fixes for akka example
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/36 @scottkwalker -- apologies for the delay, thanks for your code style fix! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #36: Fixes for akka example
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/36 @scottkwalker -- thank you. LGTM. I will merge this tonight --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #35: [MINOR] Update comments
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/35 @prabeesh -- thanks for catching this inconsistency. I will merge this later today. Welcome to the [Apache Bahir](http://bahir.apache.org/) developer community! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] bahir issue #28: [BAHIR-75] [WIP] Remote HDFS connector for Apache Spark usi...
Github user ckadner commented on the issue: https://github.com/apache/bahir/pull/28 @snowch the code snippet I put under usability in my comment was merely a suggestion for an alternative to using hadoop configuration properties. I had intended the _servers.xml_ file to contain all of the users remote Hadoop connections with _host_, _port_, _username_, _password_, etc. so that this type of configuration would not have to be done in the Spark program. All configuration files and truststore file would reside on the Spark driver (master node). In terms of SSL validation, you could opt to by-pass certificate validation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---