Can spark convert String to Integer when reading using schema in structured streaming

2019-11-22 Thread Aniruddha P Tekade
Hi,

I am new to spark and learning spark structured streaming. I am using
structured streaming with schema specified with the help of case class and
encoders to get the streaming dataframe.

case class SampleLogEntry(
 dateTime: Timestamp,
 clientIp: String,
 userId: String,
 operation: String,
 bucketName: String,
 contAccUsrId: String,
 reqHeader: Integer,
 reqBody: Integer,
 respHeader: Integer,
 respBody: Integer,
 totalReqResSize: Integer,
 duration: Integer,
 objectName: String,
 httpStatus: Integer,
 s3ReqId: String,
 etag: String,
 errCode: Integer,
 srcBucket: String
   )

val sampleLogSchema = Encoders.product[SampleLogEntry].schema // using encoders

val rawData = spark
  .readStream
  .format("")
  .option("delimiter", "|")
  .option("header", "true")
  .schema(sampleLogSchema)
  .load("/Users/home/learning-spark/logs")

However, I am getting only null values with this schema -

---
Batch: 0
---
+++--+-+--++-+---+--++-++--+--++-+---+-+
|dateTime|  
IP|userId|s3Api|bucketName|accessUserId|reqHeader|reqBody|respHeader|respBody|totalSize|duration|objectName|httpStatus|reqestId|objectTag|errCode|srcBucket|
+++--+-+--++-+---+--++-++--+--++-+---+-+
|null|null|  null| null|  null|null| null|   null|
 null|null| null|null|  null|  null|null|
   null|   null| null|
|null|null|  null| null|  null|null| null|   null|
 null|null| null|null|  null|  null|null|
   null|   null| null|
|null|null|  null| null|  null|null| null|   null|
 null|null| null|null|  null|  null|null|
   null|   null| null|
|null|null|  null| null|  null|null| null|   null|
 null|null| null|null|  null|  null|null|
   null|   null| null|

After trying multiple option like getting schema from sample data,
defining schema
structType I changed every field in this schema to String -

case class SampleLogEntry(
   dateTime: String,
   IP: String,
   userId: String,
   s3Api: String,
   bucketName: String,
   accessUserId: String,
   reqHeader: String,
   reqBody: String,
   respHeader: String,
   respBody: String,
   totalSize: String,
   duration: String,
   objectName: String,
   httpStatus: String,
   reqestId: String,
   objectTag: String,
   errCode: String,
   srcBucket: String
 )


I am new to spark and streaming. I am using structured streaming with
schema specified with the help of case class and encoders to get the
streaming dataframe.

case class SampleLogEntry(
 dateTime: Timestamp,
 clientIp: String,
 userId: String,
 operation: String,
 bucketName: String,
 contAccUsrId: String,
 reqHeader: Integer,
 reqBody: Integer,
 respHeader: Integer,
 respBody: Integer,
 totalReqResSize: Integer,
 duration: Integer,
 objectName: String,
 httpStatus: Integer,
 s3ReqId: String,
 etag: String,
 errCode: Integer,
 srcBucket: String
   )

val sampleLogSchema = Encoders.product[SampleLogEntry].schema // using encoders

val rawData = spark
  .readStream
  .format("")
  .option("delimiter", "|")
  .option("header", "true")
  .schema(sampleLogSchema)
  .lo

Re: SparkR integration with Hive 3 spark-r

2019-11-22 Thread Alfredo Marquez
Does anyone else have some insight to this question?

Thanks,

Alfredo

On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez 
wrote:

> Hello Nicolas,
>
> Well the issue is that with Hive 3, Spark gets it's own metastore,
> separate from the Hive 3 metastore.  So how do you reconcile this
> separation of metastores?
>
> Can you continue to "enableHivemetastore" and be able to connect to Hive
> 3? Does this connection take advantage of Hive's LLAP?
>
> Our team doesn't believe that it's possible to make the connection as you
> would in the past.  But if it is that simple, I would be ecstatic 😁.
>
> Thanks,
>
> Alfredo
>
> On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris 
> wrote:
>
>> Hi Alfredo
>>
>> my 2 cents:
>> To my knowlegde and reading the spark3 pre-release note, it will handle
>> hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
>> tests on this in the past[1] and it seems to handle any hive metastore
>> version.
>>
>> However spark cannot read hive managed table AKA transactional tables.
>> So I would say you should be able to read any hive 3 regular table with
>> any of spark, pyspark or sparkR.
>>
>>
>> [1]
>> https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/
>>
>> On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
>> > Hello,
>> >
>> > Our company is moving to Hive 3, and they are saying that there is no
>> SparkR
>> > implementation in Spark 2.3.x + that will connect to Hive 3.  Is this
>> true?
>> >
>> > If it is true, will this be addressed in the Spark 3 release?
>> >
>> > I don't use python, so losing SparkR to get work done on Hadoop is a
>> huge loss.
>> >
>> > P.S. This is my first email to this community; if there is something I
>> should
>> > do differently, please let me know.
>> >
>> > Thank you
>> >
>> > Alfredo
>>
>> --
>> nicolas
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>