[
https://issues.apache.org/jira/browse/BAHIR-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760354#comment-16760354
]
ASF GitHub Bot commented on BAHIR-186:
--------------------------------------
ckadner commented on issue #75: [BAHIR-186] Reduce size of sql-cloudant test
database
URL: https://github.com/apache/bahir/pull/75#issuecomment-460471642
@emlaver -- your changes look good and help speeding up the test execution 👍
I do have an issue that predates your changes though. After running `mvn
clean install` and a few subsequent `mvn test -pl sql-cloudant` I keep running
into the following test error in `CloudantChangesDFSuite`:
```
- load data and verify deleted doc is not in results *** FAILED ***
org.apache.bahir.cloudant.common.CloudantException: Error retrieving
_changes feed data from database 'n_flight' with response code 401:
{"error":"unauthorized","reason":"Name or password is incorrect."}
```
When I run only the `CloudantChangesDFSuite` independently I do not get that
error.
```
$ mvn scalatest:test -pl sql-cloudant
-Dsuites='org.apache.bahir.cloudant.CloudantChangesDFSuite'
Run starting. Expected test count is: 9
CloudantChangesDFSuite:
Sql-cloudant tests that require Cloudant databases have been enabled by
the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD.
- load and save data from Cloudant database
- load and count data from Cloudant search index
- load data and verify deleted doc is not in results
- load data and count rows in filtered dataframe
- save filtered dataframe to database
- save dataframe to database using createDBOnSave=true option
- load and count data from view
- load data from view with MapReduce function
- load data and verify total count of selector, filter, and view option
Run completed in 2 minutes, 40 seconds.
Total number of tests run: 9
Suites: completed 1, aborted 0
Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
Maven log (console output) for all unit tests in SQL-Cloudant:
```
$ mvn test -pl sql-cloudant
...
[INFO] --- scalatest-maven-plugin:1.0:test (test) @ spark-sql-cloudant_2.11
---
Discovery starting.
Sql-cloudant tests that require Cloudant databases have been enabled by
the environment variables CLOUDANT_USER and CLOUDANT_PASSWORD.
...
CloudantChangesDFSuite:
- load and save data from Cloudant database
- load and count data from Cloudant search index
- load data and verify deleted doc is not in results *** FAILED ***
org.apache.bahir.cloudant.common.CloudantException: Error retrieving
_changes feed data from database 'n_flight' with response code 401:
{"error":"unauthorized","reason":"Name or password is incorrect."}
at org.apache.bahir.cloudant.DefaultSource.create(DefaultSource.scala:162)
at
org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:95)
at
org.apache.bahir.cloudant.DefaultSource.createRelation(DefaultSource.scala:87)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at
org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply$mcV$sp(CloudantChangesDFSuite.scala:67)
at
org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply(CloudantChangesDFSuite.scala:62)
at
org.apache.bahir.cloudant.CloudantChangesDFSuite$$anonfun$8.apply(CloudantChangesDFSuite.scala:62)
...
- load data and count rows in filtered dataframe
- save filtered dataframe to database
- save dataframe to database using createDBOnSave=true option
- load and count data from view
- load data from view with MapReduce function
- load data and verify total count of selector, filter, and view option
...
Run completed in 3 minutes, 34 seconds.
Total number of tests run: 25
Suites: completed 6, aborted 0
Tests: succeeded 24, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
```
However, when I take a look at the `sql-cloudant/target/unit-tests.log` I
don't find any errors logged during the execution of that test case:
```
===== TEST OUTPUT FOR org.apache.bahir.cloudant.CloudantChangesDFSuite:
'load data and verify deleted doc is not in results' =====
19/02/04 15:07:11.114 ScalaTest-main-running-CloudantChangesDFSuite INFO
SharedState: loading hive config file:
jar:file:/Users/ckadner/.m2/repository/org/apache/spark/spark-sql_2.11/2.3.2/spark-sql_2.11-2.3.2-tests.jar!/hive-site.xml
19/02/04 15:07:11.130 ScalaTest-main-running-CloudantChangesDFSuite INFO
SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of
spark.sql.warehouse.dir
('file:/Users/ckadner/Projects/bahir_apache_merges/sql-cloudant/spark-warehouse/').
19/02/04 15:07:11.130 ScalaTest-main-running-CloudantChangesDFSuite INFO
SharedState: Warehouse path is
'file:/Users/ckadner/Projects/bahir_apache_merges/sql-cloudant/spark-warehouse/'.
19/02/04 15:07:11.132 ScalaTest-main-running-CloudantChangesDFSuite INFO
StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
19/02/04 15:07:11.288 ScalaTest-main-running-CloudantChangesDFSuite INFO
DefaultSource: Loading data from Cloudant using
https://testy-spark-connector.cloudant.com/n_flight/_changes?include_docs=true&feed=normal&seq_interval=200&timeout=60000&filter=_selector
19/02/04 15:07:11.292 streaming-start INFO ReceiverTracker: Starting 1
receivers
19/02/04 15:07:11.292 streaming-start INFO ReceiverTracker: ReceiverTracker
started
19/02/04 15:07:11.292 streaming-start INFO PluggableInputDStream: Slide time
= 8000 ms
19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Storage
level = Memory Deserialized 1x Replicated
19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Checkpoint
interval = null
19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream: Remember
interval = 8000 ms
19/02/04 15:07:11.293 streaming-start INFO PluggableInputDStream:
Initialized and validated
org.apache.spark.streaming.dstream.PluggableInputDStream@3fb41f46
19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Slide time = 8000
ms
19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Storage level =
Serialized 1x Replicated
19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Checkpoint
interval = null
19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Remember interval
= 8000 ms
19/02/04 15:07:11.293 streaming-start INFO ForEachDStream: Initialized and
validated org.apache.spark.streaming.dstream.ForEachDStream@14cb05dd
19/02/04 15:07:11.293 streaming-start INFO RecurringTimer: Started timer for
JobGenerator at time 1549321632000
19/02/04 15:07:11.294 streaming-start INFO JobGenerator: Started
JobGenerator at 1549321632000 ms
19/02/04 15:07:11.294 streaming-start INFO JobScheduler: Started JobScheduler
19/02/04 15:07:11.294 ScalaTest-main-running-CloudantChangesDFSuite INFO
StreamingContext: StreamingContext started
19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Got job 0
(start at DefaultSource.scala:155) with 1 output partitions
19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Final
stage: ResultStage 0 (start at DefaultSource.scala:155)
19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Parents of
final stage: List()
19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Missing
parents: List()
19/02/04 15:07:11.311 dispatcher-event-loop-2 INFO ReceiverTracker: Receiver
0 started
19/02/04 15:07:11.311 dag-scheduler-event-loop INFO DAGScheduler: Submitting
ResultStage 0 (Receiver 0 ParallelCollectionRDD[1] at makeRDD at
ReceiverTracker.scala:613), which has no missing parents
19/02/04 15:07:11.327 dag-scheduler-event-loop INFO MemoryStore: Block
broadcast_0 stored as values in memory (estimated size 62.3 KB, free 1638.5 MB)
19/02/04 15:07:11.329 dag-scheduler-event-loop INFO MemoryStore: Block
broadcast_0_piece0 stored as bytes in memory (estimated size 21.8 KB, free
1638.5 MB)
19/02/04 15:07:11.329 dispatcher-event-loop-3 INFO BlockManagerInfo: Added
broadcast_0_piece0 in memory on ckadner-mbp.hsd1.ca.comcast.net:62424 (size:
21.8 KB, free: 1638.6 MB)
19/02/04 15:07:11.329 dag-scheduler-event-loop INFO SparkContext: Created
broadcast 0 from broadcast at DAGScheduler.scala:1039
19/02/04 15:07:11.330 dag-scheduler-event-loop INFO DAGScheduler: Submitting
1 missing tasks from ResultStage 0 (Receiver 0 ParallelCollectionRDD[1] at
makeRDD at ReceiverTracker.scala:613) (first 15 tasks are for partitions
Vector(0))
19/02/04 15:07:11.330 dag-scheduler-event-loop INFO TaskSchedulerImpl:
Adding task set 0.0 with 1 tasks
19/02/04 15:07:11.332 dispatcher-event-loop-0 INFO TaskSetManager: Starting
task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0,
PROCESS_LOCAL, 9000 bytes)
19/02/04 15:07:11.332 Executor task launch worker for task 0 INFO Executor:
Running task 0.0 in stage 0.0 (TID 0)
19/02/04 15:07:11.341 Executor task launch worker for task 0 INFO
RecurringTimer: Started timer for BlockGenerator at time 1549321631400
19/02/04 15:07:11.341 Executor task launch worker for task 0 INFO
BlockGenerator: Started BlockGenerator
19/02/04 15:07:11.342 Thread-35 INFO BlockGenerator: Started block pushing
thread
19/02/04 15:07:11.342 dispatcher-event-loop-0 INFO ReceiverTracker:
Registered receiver for stream 0 from ckadner-mbp.hsd1.ca.comcast.net:62423
19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO
ReceiverSupervisorImpl: Starting receiver 0
19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO
ReceiverSupervisorImpl: Called receiver 0 onStart
19/02/04 15:07:11.342 Executor task launch worker for task 0 INFO
ReceiverSupervisorImpl: Waiting for receiver to be stopped
19/02/04 15:07:12.003 JobGenerator INFO JobScheduler: Added jobs for time
1549321632000 ms
19/02/04 15:07:12.003 JobScheduler INFO JobScheduler: Starting job streaming
job 1549321632000 ms.0 from job set of time 1549321632000 ms
19/02/04 15:07:12.051 streaming-job-executor-0 INFO SparkContext: Starting
job: json at DefaultSource.scala:150
19/02/04 15:07:12.052 streaming-job-executor-0 INFO DAGScheduler: Job 1
finished: json at DefaultSource.scala:150, took 0.000025 s
19/02/04 15:07:12.063 dispatcher-event-loop-1 INFO ReceiverTracker: Sent
stop signal to all 1 receivers
19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl:
Received stop signal
19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl:
Stopping receiver with message: Stopped by driver:
19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl:
Called receiver onStop
19/02/04 15:07:12.063 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl:
Deregistering receiver 0
19/02/04 15:07:12.064 dispatcher-event-loop-3 ERROR ReceiverTracker:
Deregistered receiver for stream 0: Stopped by driver
19/02/04 15:07:12.064 dispatcher-event-loop-2 INFO ReceiverSupervisorImpl:
Stopped receiver 0
19/02/04 15:07:12.064 dispatcher-event-loop-2 INFO BlockGenerator: Stopping
BlockGenerator
19/02/04 15:07:12.402 dispatcher-event-loop-2 INFO RecurringTimer: Stopped
timer for BlockGenerator after time 1549321632400
19/02/04 15:07:12.402 dispatcher-event-loop-2 INFO BlockGenerator: Waiting
for block pushing thread to terminate
19/02/04 15:07:12.406 Thread-35 INFO BlockGenerator: Pushing out the last 0
blocks
19/02/04 15:07:12.406 Thread-35 INFO BlockGenerator: Stopped block pushing
thread
19/02/04 15:07:12.406 dispatcher-event-loop-2 INFO BlockGenerator: Stopped
BlockGenerator
19/02/04 15:07:12.406 Executor task launch worker for task 0 INFO
ReceiverSupervisorImpl: Stopped receiver without error
19/02/04 15:07:12.407 Executor task launch worker for task 0 INFO Executor:
Finished task 0.0 in stage 0.0 (TID 0). 751 bytes result sent to driver
19/02/04 15:07:12.407 task-result-getter-0 INFO TaskSetManager: Finished
task 0.0 in stage 0.0 (TID 0) in 1076 ms on localhost (executor driver) (1/1)
19/02/04 15:07:12.407 task-result-getter-0 INFO TaskSchedulerImpl: Removed
TaskSet 0.0, whose tasks have all completed, from pool
19/02/04 15:07:12.408 dag-scheduler-event-loop INFO DAGScheduler:
ResultStage 0 (start at DefaultSource.scala:155) finished in 1.096 s
19/02/04 15:07:12.408 streaming-job-executor-0 INFO ReceiverTracker: All of
the receivers have deregistered successfully
19/02/04 15:07:12.408 streaming-job-executor-0 INFO ReceiverTracker:
ReceiverTracker stopped
19/02/04 15:07:12.408 streaming-job-executor-0 INFO JobGenerator: Stopping
JobGenerator immediately
19/02/04 15:07:12.409 streaming-job-executor-0 INFO RecurringTimer: Stopped
timer for JobGenerator after time 1549321632000
19/02/04 15:07:12.409 streaming-job-executor-0 INFO JobGenerator: Stopped
JobGenerator
19/02/04 15:07:12.687 Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator 470
<<19/02/04 15:07:12. ... Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator ...>>
19/02/04 15:07:12.688 Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator 479
19/02/04 15:07:12.690 dispatcher-event-loop-3 INFO BlockManagerInfo: Removed
broadcast_0_piece0 on ckadner-mbp.hsd1.ca.comcast.net:62424 in memory (size:
21.8 KB, free: 1638.6 MB)
19/02/04 15:07:12.693 Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator 467
<<19/02/04 15:07:12. ... Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator ...>>
19/02/04 15:07:12.693 Spark Context Cleaner INFO ContextCleaner: Cleaned
accumulator 486
19/02/04 15:07:14.412 streaming-job-executor-0 INFO JobScheduler: Stopped
JobScheduler
19/02/04 15:07:14.413 streaming-job-executor-0 INFO StreamingContext:
StreamingContext stopped successfully
19/02/04 15:07:14.415 ScalaTest-main-running-CloudantChangesDFSuite INFO
CloudantChangesDFSuite:
===== FINISHED org.apache.bahir.cloudant.CloudantChangesDFSuite: 'load data
and verify deleted doc is not in results' =====
```
I do however find that error message earlier in the log during the test case
`incorrect password throws an error message for changes receiver` where it is
expected:
```
===== TEST OUTPUT FOR org.apache.bahir.cloudant.CloudantOptionSuite:
'incorrect password throws an error message for changes receiver' =====
...
19/02/04 15:06:32.751 Cloudant Receiver ERROR CookieInterceptor: Credentials
are incorrect for server
https://testy-spark-connector.cloudant.com:443/_session, cookie authentication
will not be attempted again by this interceptor object
19/02/04 15:06:33.909 Cloudant Receiver WARN ReceiverSupervisorImpl:
Reported error Error retrieving _changes feed data from database 'n_flight'
with response code 401: {"error":"unauthorized","reason":"Name or password is
incorrect."} - org.apache.bahir.cloudant.common.CloudantException: Error
retrieving _changes feed data from database 'n_flight' with response code 401:
{"error":"unauthorized","reason":"Name or password is incorrect."}
19/02/04 15:06:33.912 dispatcher-event-loop-1 WARN ReceiverTracker: Error
reported by receiver for stream 0: Error retrieving _changes feed data from
database 'n_flight' with response code 401:
{"error":"unauthorized","reason":"Name or password is incorrect."} -
org.apache.bahir.cloudant.common.CloudantException: Error retrieving _changes
feed data from database 'n_flight' with response code 401:
{"error":"unauthorized","reason":"Name or password is incorrect."}
at
org.apache.bahir.cloudant.internal.ChangesReceiver.org$apache$bahir$cloudant$internal$ChangesReceiver$$receive(ChangesReceiver.scala:79)
at
org.apache.bahir.cloudant.internal.ChangesReceiver$$anon$1.run(ChangesReceiver.scala:37)
...
19/02/04 15:06:44.445 ScalaTest-main-running-CloudantOptionSuite INFO
CloudantOptionSuite:
===== FINISHED org.apache.bahir.cloudant.CloudantOptionSuite: 'incorrect
password throws an error message for changes receiver' =====
```
When I look at the sources here
https://github.com/apache/bahir/blob/c51853d135ad2d9da67804259f4ed0e29223afb3/sql-cloudant/src/main/scala/org/apache/bahir/cloudant/DefaultSource.scala#L159-L163
...it appears that this error is a fluke? The test case `load data and
verify deleted doc is not in results` expects the schema to be empty but the
`DefaultSource` treats that as an error and throws up the last
`receiverErrorMsg` which in this case comes from the earlier run test case
`incorrect password throws an error message for changes receiver`.
At least I thought that at first sight, but the test case does not seem to
expect an empty schema
https://github.com/apache/bahir/blob/c51853d135ad2d9da67804259f4ed0e29223afb3/sql-cloudant/src/test/scala/org/apache/bahir/cloudant/CloudantChangesDFSuite.scala#L67-L69
So I am not sure what is happening. But you might? :-)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Support SSL connection in MQTT SQL Streaming
> --------------------------------------------
>
> Key: BAHIR-186
> URL: https://issues.apache.org/jira/browse/BAHIR-186
> Project: Bahir
> Issue Type: New Feature
> Components: Spark Streaming Connectors
> Reporter: Lukasz Antoniak
> Assignee: Lukasz Antoniak
> Priority: Major
> Fix For: Spark-2.4.0
>
>
> Mailing list discussion:
> https://www.mail-archive.com/[email protected]/msg00022.html.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)