Hi,
I am new to spark and learning spark structured streaming. I am using
structured streaming with schema specified with the help of case class and
encoders to get the streaming dataframe.
case class SampleLogEntry(
dateTime: Timestamp,
clientIp: String,
userId: String,
operation: String,
bucketName: String,
contAccUsrId: String,
reqHeader: Integer,
reqBody: Integer,
respHeader: Integer,
respBody: Integer,
totalReqResSize: Integer,
duration: Integer,
objectName: String,
httpStatus: Integer,
s3ReqId: String,
etag: String,
errCode: Integer,
srcBucket: String
)
val sampleLogSchema = Encoders.product[SampleLogEntry].schema // using encoders
val rawData = spark
.readStream
.format("")
.option("delimiter", "|")
.option("header", "true")
.schema(sampleLogSchema)
.load("/Users/home/learning-spark/logs")
However, I am getting only null values with this schema -
---
Batch: 0
---
+++--+-+--++-+---+--++-++--+--++-+---+-+
|dateTime|
IP|userId|s3Api|bucketName|accessUserId|reqHeader|reqBody|respHeader|respBody|totalSize|duration|objectName|httpStatus|reqestId|objectTag|errCode|srcBucket|
+++--+-+--++-+---+--++-++--+--++-+---+-+
|null|null| null| null| null|null| null| null|
null|null| null|null| null| null|null|
null| null| null|
|null|null| null| null| null|null| null| null|
null|null| null|null| null| null|null|
null| null| null|
|null|null| null| null| null|null| null| null|
null|null| null|null| null| null|null|
null| null| null|
|null|null| null| null| null|null| null| null|
null|null| null|null| null| null|null|
null| null| null|
After trying multiple option like getting schema from sample data,
defining schema
structType I changed every field in this schema to String -
case class SampleLogEntry(
dateTime: String,
IP: String,
userId: String,
s3Api: String,
bucketName: String,
accessUserId: String,
reqHeader: String,
reqBody: String,
respHeader: String,
respBody: String,
totalSize: String,
duration: String,
objectName: String,
httpStatus: String,
reqestId: String,
objectTag: String,
errCode: String,
srcBucket: String
)
I am new to spark and streaming. I am using structured streaming with
schema specified with the help of case class and encoders to get the
streaming dataframe.
case class SampleLogEntry(
dateTime: Timestamp,
clientIp: String,
userId: String,
operation: String,
bucketName: String,
contAccUsrId: String,
reqHeader: Integer,
reqBody: Integer,
respHeader: Integer,
respBody: Integer,
totalReqResSize: Integer,
duration: Integer,
objectName: String,
httpStatus: Integer,
s3ReqId: String,
etag: String,
errCode: Integer,
srcBucket: String
)
val sampleLogSchema = Encoders.product[SampleLogEntry].schema // using encoders
val rawData = spark
.readStream
.format("")
.option("delimiter", "|")
.option("header", "true")
.schema(sampleLogSchema)
.lo