from:"Nkechi Achara"

Re: Issues with reading gz files with Spark Streaming

2016-10-22 Thread Nkechi Achara

I do not use rename, and the files are written to, and then moved to a directory on HDFS in gz format. On 22 October 2016 at 15:14, Steve Loughran <ste...@hortonworks.com> wrote: > > > On 21 Oct 2016, at 15:53, Nkechi Achara <nkach...@googlemail.com> wrote: > > >

Issues with reading gz files with Spark Streaming

2016-10-21 Thread Nkechi Achara

Hi, I am using Spark 1.5.0 to read gz files with textFileStream, but when new files are dropped in the specified directory. I know this is only the case with gz files as when i extract the file into the directory specified the files are read on the next window and processed. My code is here:

Anyone got a good solid example of integrating Spark and Solr

2016-09-14 Thread Nkechi Achara

Hi All, I am trying to find some good examples on how to implement Spark and Solr and coming up blank. Basically the implementation of spark-solr does not seem to work correctly with CDH 552 (*1.5.x branch) where i am receiving various issues relating to dependencies, which I have not been fully

Spark, Kryo Serialization Issue with ProtoBuf field

2016-07-13 Thread Nkechi Achara

Hi, I am seeing an error when running my spark job relating to Serialization of a protobuf field when transforming an RDD. com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: otherAuthors_

Having issues of passing properties to Spark in 1.5 in comparison to 1.2

2016-07-05 Thread Nkechi Achara

After using Spark 1.2 for quite a long time, I have realised that you can no longer pass spark configuration to the driver via the --conf via command line (or in my case shell script). I am thinking about using system properties and picking the config up using the following bit of code: def

Using saveAsNewAPIHadoopDataset for Saving custom classes to Hbase

2016-04-22 Thread Nkechi Achara

Hi All, I ma having a few issues saving my data to Hbase. I have created a pairRDD for my custom class using the following: val rdd1 =rdd.map{it=> (getRowKey(it), it) } val job = Job.getInstance(hConf) val jobConf = job.getConfiguration

Using Spark to retrieve a HDFS file protected by Kerberos

2016-03-23 Thread Nkechi Achara

I am having issues setting up my spark environment to read from a kerberized HDFS file location. At the moment I have tried to do the following: def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match { case None => code case Some(u) => u.doAs(new

Using Spark to retrieve a HDFS file protected by Kerberos

2016-03-22 Thread Nkechi Achara

I am having issues setting up my spark environment to read from a kerberized HDFS file location. At the moment I have tried to do the following: def ugiDoAs[T](ugi: Option[UserGroupInformation])(code: => T) = ugi match { case None => code case Some(u) => u.doAs(new

Fwd: Issues with Long subtraction in an RDD when utilising tailrecursion

2016-01-26 Thread Nkechi Achara

at 22:51, Ted Yu <yuzhih...@gmail.com> wrote: > bq. (successfulAuction.timestampNanos - auction.timestampNanos) < > 1000L && > > Have you included the above condition into consideration when inspecting > timestamps of the results ? > > On Tue, Jan 26,

Issues with Long subtraction in an RDD when utilising tailrecursion

2016-01-26 Thread Nkechi Achara

down votefavorite I am having an issue with the subtraction of a long within an RDD to filter out items in the RDD that are within a certain time range. So my code filters an

Any beginner samples for using ML / MLIB to produce a moving average of a (K, iterable[V])

2015-07-15 Thread Nkechi Achara

Hi all, I am trying to get some summary statistics to retrieve the moving average for several devices that have an array or latency in seconds in this kind of format: deviceLatencyMap = [K:String, Iterable[V: Double]] I understand that there is a MultivariateSummary, but as this is a trait, but

Re: Issues with reading gz files with Spark Streaming

Issues with reading gz files with Spark Streaming

Anyone got a good solid example of integrating Spark and Solr

Spark, Kryo Serialization Issue with ProtoBuf field

Having issues of passing properties to Spark in 1.5 in comparison to 1.2

Using saveAsNewAPIHadoopDataset for Saving custom classes to Hbase

Using Spark to retrieve a HDFS file protected by Kerberos

Using Spark to retrieve a HDFS file protected by Kerberos

Fwd: Issues with Long subtraction in an RDD when utilising tailrecursion

Issues with Long subtraction in an RDD when utilising tailrecursion

Any beginner samples for using ML / MLIB to produce a moving average of a (K, iterable[V])

11 matches

Site Navigation

Mail list logo

Footer information