persistence iops and throughput check? Re: Running a spark code on multiple machines using google cloud platform

2017-02-02 Thread Heji Kim
Dear Anahita, When we run performance tests for Spark/YARN clusters on GCP, we have to make sure we are within iops and throughput limits. Depending on disk type (standard or SSD) and size of disk, you will only get so many max sustained iops and throughput per sec. The GCP instance metrics

structured streaming 2.1.0 kafka driver --packages 'org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0' works on YARN but having trouble on standalone cluster mode

2017-01-26 Thread Heji Kim
Hello everyone, Currently we are testing structured streaming kafka drivers. We submit on YARN(2.7.3) with --packages 'org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0', without problems. However when we try to launch on spark standalone with deploy mode=cluster, we get the

How to access metrics for Structured Streaming 2.1

2017-01-17 Thread Heji Kim
Hello. We are trying to migrate and performance test the kafka sink for structured streaming in 2.1. Obviously we miss the beautiful Streaming Statistics ui tab and we are trying to figure out the most reasonable way to monitor event processing rates and lag time. 1. Are the SourceStatus and