Re: Use Spark extension points to implement row-level security

2018-08-18 Thread Richard Siebeling
h the constructor and > not using the scala getOrCreate() method (I've sent an email regarding > this). But other than that, it works. > > > On Fri, Aug 17, 2018, 03:56 Richard Siebeling > wrote: > >> Hi, >> >> I'd like to implement some kind of row-level secu

Use Spark extension points to implement row-level security

2018-08-17 Thread Richard Siebeling
Hi, I'd like to implement some kind of row-level security and am thinking of adding additional filters to the logical plan possibly using the Spark extensions. Would this be feasible, for example using the injectResolutionRule? thanks in advance, Richard

Determine Cook's distance / influential data points

2017-12-13 Thread Richard Siebeling
Hi, would it be possible to determine the Cook's distance using Spark? thanks, Richard

Re: Handling skewed data

2017-04-19 Thread Richard Siebeling
I'm also interested in this, does anyone this? On 17 April 2017 at 17:17, Vishnu Viswanath wrote: > Hello All, > > Does anyone know if the skew handling code mentioned in this talk > https://www.youtube.com/watch?v=bhYV0JOPd9Y was added to spark? > > If so can I

Re: Fast write datastore...

2017-03-15 Thread Richard Siebeling
maybe Apache Ignite does fit your requirements On 15 March 2017 at 08:44, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Hi > If queries are statics and filters are on the same columns, Cassandra is a > good option. > > Le 15 mars 2017 7:04 AM, "muthu" a écrit

Re: Continuous or Categorical

2017-03-01 Thread Richard Siebeling
I think it's difficult to determine with certainty if a variable is continuous or categorical, what to do when the values are numbers like 1, 2, 2, 3, 4, 5. These values can both be continuous as categorical. for exa However you could perform some checks: - are there any decimal values > it will

Re: is it possible to read .mdb file in spark

2017-01-26 Thread Richard Siebeling
Hi, haven't used it, but Jackcess should do the trick > http://jackcess.sourceforge.net/ kind regards, Richard 2017-01-25 11:47 GMT+01:00 Selvam Raman : > > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-09 Thread Richard Siebeling
changes of behaviour or changes in the build process or something like that, kind regards, Richard On 9 January 2017 at 22:55, Richard Siebeling <rsiebel...@gmail.com> wrote: > Hi, > > I'm setting up Apache Spark 2.1.0 on Mesos and I am getting a "Could not > p

Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-09 Thread Richard Siebeling
Hi, I'm setting up Apache Spark 2.1.0 on Mesos and I am getting a "Could not parse Master URL: 'mesos://xx.xx.xxx.xxx:5050'" error. Mesos is running fine (both the master as the slave, it's a single machine configuration). I really don't understand why this is happening since the same

Re: Best way to calculate intermediate column statistics

2016-08-25 Thread Richard Siebeling
arising from > such loss, damage or destruction. > > > > On 24 August 2016 at 21:37, Richard Siebeling <rsiebel...@gmail.com> > wrote: > >> Hi Mich, >> >> I'd like to gather several statistics per column in order to make >> analysing data easier.

Re: Best way to calculate intermediate column statistics

2016-08-24 Thread Richard Siebeling
cache >> (persist) that result just after the calculation? >> Then you may aggregate statistics from the cached dataframe. >> This way it won't hit performance too much. >> >> Regards >> -- >> Bedrytski Aliaksandr >> sp...@bedryt.ski &

Best way to calculate intermediate column statistics

2016-08-24 Thread Richard Siebeling
Hi, what is the best way to calculate intermediate column statistics like the number of empty values and the number of distinct values each column in a dataset when aggregating of filtering data next to the actual result of the aggregate or the filtered data? We are developing an application in

Re: Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-04 Thread Richard Siebeling
fixed! after adding the option -DskipTests everything build ok. Thanks Sean for your help On Thu, Aug 4, 2016 at 8:18 PM, Richard Siebeling <rsiebel...@gmail.com> wrote: > I don't see any other errors, these are the last lines of the > make-distribution log. > Ab

Re: Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-04 Thread Richard Siebeling
016 at 6:30 PM, Sean Owen <so...@cloudera.com> wrote: > That message is a warning, not error. It is just because you're cross > compiling with Java 8. If something failed it was elsewhere. > > > On Thu, Aug 4, 2016, 07:09 Richard Siebeling <rsiebel...@gmail.com> w

Spark 2.0 - make-distribution fails while regular build succeeded

2016-08-04 Thread Richard Siebeling
Hi, spark 2.0 with mapr hadoop libraries was succesfully build using the following command: ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0-mapr-1602 -DskipTests clean package However when I then try to build a runnable distribution using the following command ./dev/make-distribution.sh

Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
;ja...@japila.pl <javascript:_e(%7B%7D,'cvml','ja...@japila.pl');>> het volgende geschreven: > On Sun, May 15, 2016 at 5:50 PM, Richard Siebeling <rsiebel...@gmail.com> > wrote: > > > I'm getting the following errors running SparkPi on a clean just compiled > >

Re: Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
B.t.w. this is on a single node cluster Op zondag 15 mei 2016 heeft Richard Siebeling <rsiebel...@gmail.com> het volgende geschreven: > Hi, > > I'm getting the following errors running SparkPi on a clean just compiled > and checked Mesos 0.29.0 installation with Spark 1.6.1 >

Errors when running SparkPi on a clean Spark 1.6.1 on Mesos

2016-05-15 Thread Richard Siebeling
Hi, I'm getting the following errors running SparkPi on a clean just compiled and checked Mesos 0.29.0 installation with Spark 1.6.1 16/05/15 23:05:52 ERROR TaskSchedulerImpl: Lost executor e23f2d53-22c5-40f0-918d-0d73805fdfec-S0/0 on xxx Remote RPC client disassociated. Likely due to containers

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling
utsString = "TX,NV,WY" >>> val stringList = inputString.split(",") >>> (stringList, stringList.size) >>> } >>> >>> If you then wanted to find out how many state columns you should have in >>> your table you could use a

Split columns in RDD

2016-01-19 Thread Richard Siebeling
Hi, what is the most efficient way to split columns and know how many columns are created. Here is the current RDD - ID STATE - 1 TX, NY, FL 2 CA, OH - This is the preferred output: - IDSTATE_1 STATE_2

Re: Split columns in RDD

2016-01-19 Thread Richard Siebeling
> Sab > On 19-Jan-2016 8:48 pm, "Richard Siebeling" <rsiebel...@gmail.com> wrote: > >> Hi, >> >> what is the most efficient way to split columns and know how many columns >> are created. >> >> Here is the current RDD >> ---

Stacking transformations and using intermediate results in the next transformation

2016-01-15 Thread Richard Siebeling
Hi, we're stacking multiple RDD operations on each other, for example as a source we have a RDD[List[String]] like ["a", "b, c", "d"] ["a", "d, a", "d"] In the first step we split the second column in two columns, in the next step we filter the data on column 3 = "c" and in the final step we're

Re: ROSE: Spark + R on the JVM.

2016-01-13 Thread Richard Siebeling
Hi David, the use case is that we're building a data processing system with an intuitive user interface where Spark is used as the data processing framework. We would like to provide a HTML user interface to R where the user types or copy-pastes his R code, the system should then send this R code

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Richard Siebeling
Hi, this looks great and seems to be very usable. Would it be possible to access the session API from within ROSE, to get for example the images that are generated by R / openCPU and the logging to stdout that is logged by R? thanks in advance, Richard On Tue, Jan 12, 2016 at 10:16 PM, Vijay

Re: combining operations elegantly

2014-03-24 Thread Richard Siebeling
23, 2014 at 2:26 PM, Richard Siebeling rsiebel...@gmail.com wrote: Hi Koert, Patrick, do you already have an elegant solution to combine multiple operations on a single RDD? Say for example that I want to do a sum over one column, a count and an average over another column

Re: combining operations elegantly

2014-03-23 Thread Richard Siebeling
Hi Koert, Patrick, do you already have an elegant solution to combine multiple operations on a single RDD? Say for example that I want to do a sum over one column, a count and an average over another column, thanks in advance, Richard On Mon, Mar 17, 2014 at 8:20 AM, Richard Siebeling rsiebel

Re: combining operations elegantly

2014-03-17 Thread Richard Siebeling
Patrick, Koert, I'm also very interested in these examples, could you please post them if you find them? thanks in advance, Richard On Thu, Mar 13, 2014 at 9:39 PM, Koert Kuipers ko...@tresata.com wrote: not that long ago there was a nice example on here about how to combine multiple