Re: spark dataframe jdbc Amazon RDS problem

2017-08-26 Thread 刘虓
my code is here: from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() mysql_jdbc_url = 'mydb/test' table = "test" props = {"user": "myname", "password": 'mypassword'} df = spark.read.jdbc(mysql_jdbc_url,table,properties=props) df.printSchema() wtf = df.collect() for i

spark dataframe jdbc Amazon RDS problem

2017-08-26 Thread 刘虓
hi,all I came across this problem yesterday: I was using data frame to read from a amazon rds mysql table ,and this exception came up: java.sql.SQLException: Invalid value for getLong() - 'id' at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964) at

Kafka Consumer Pre Fetch Messages + Async commits

2017-08-26 Thread Julia Wistance
Hi Experts, A question on what could potentially happen with Spark Streaming 2.2.0 + Kafka. LocationStrategies says that "new Kafka consumer API will pre-fetch messages into buffers.". If we store offsets in Kafka, currently we can only use a async commits. So, 1 - Could it happen that we commit