collect_list alternative for SQLContext?

2016-10-24 Thread Matt Smith
Is there an alternative function or design pattern for the collect_list UDAF that can used without taking a dependency on HiveContext? How does one typically roll things up into an array when outputting JSON?

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Mark Hamstra
The advice to avoid idioms that may not be universally understood is good. My further issue with the misuse of "straw-man" (which really is not, or should not be, separable from "straw-man argument") is that a "straw-man" in the established usage is something that is always intended to be a

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Sean Owen
Well, it's more of a reference to the fallacy than anything. Writing down a proposed action implicitly claims it's what others are arguing for. It's self-deprecating to call it a "straw man", suggesting that it may not at all be what others are arguing for, and is done to openly invite criticism

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Mark Hamstra
Alright, that does it! Who is responsible for this "straw-man" abuse that is becoming too commonplace in the Spark community? "Straw-man" does not mean something like "trial balloon" or "run it up the flagpole and see if anyone salutes", and I would really appreciate it if Spark developers would

Re: Dynamic Graph Handling

2016-10-24 Thread Joseph E. Gonzalez
What kind of partitioning are you exploring? GraphX actually has some built in partitioning algorithms but if you are interested in spectral or hierarchical methods you might want to look at Metis/Zoltan? There was some interest in integrating Metis style algorithms in Spark (GraphX or

Re: Dynamic Graph Handling

2016-10-24 Thread Jörn Franke
Maybe titandb ?! It uses Hbase to store graphs and solr (on HDFS) to index graphs. I am not 100% sure it supports it, but probably. It can also integrate Spark, but analytics on a given graph only. Otherwise you need to go for dedicated graph system. > On 24 Oct 2016, at 16:41, Marco

Dynamic Graph Handling

2016-10-24 Thread Marco
Hi, I'm a student in Computer Science and I'm working for my master thesis=20 on Graph Partitioning problem, focusing on dynamic graph. I'm searching for a framework to manage Dynamic Graph, with possible=20 disappearing of edges/nodes. Now the problem is: GraphX alone cannot=20 provide

Re: Mini-Proposal: Make it easier to contribute to the contributing to Spark Guide

2016-10-24 Thread Sean Owen
BTW I wrote up a straw-man proposal for migrating the wiki content: https://issues.apache.org/jira/browse/SPARK-18073 On Tue, Oct 18, 2016 at 12:25 PM Holden Karau wrote: > Right now the wiki isn't particularly accessible to updates by external > contributors. We've

unsubscribe

2016-10-24 Thread Chen Qiming
unsubscribe