Hey All,
I've mostly kept quiet since I am not very active in maintaining this
code anymore. However, it is a bit odd that the project is
split-brained with a lot of the code being on github and some in the
Spark repo.
If the consensus is to migrate everything to github, that seems okay
with me.
It would also be great to test this with codegen and unsafe enabled but
while continuing to use sort shuffle manager instead of the new
tungsten-sort one.
On Fri, Jul 31, 2015 at 1:39 AM, Reynold Xin r...@databricks.com wrote:
Is this deterministically reproducible? Can you try this on the
Sweet! It's here:
https://issues.apache.org/jira/browse/SPARK-9141?focusedCommentId=14649437page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14649437
On Tue, Jul 28, 2015 at 11:21 PM Michael Armbrust mich...@databricks.com
wrote:
Can you add your description of the
this looks like a mistake in FrequentItems to me. if the map is full
(map.size==size) then it should still add the new item (after removing
items from the map and decrementing counts).
if its not a mistake then at least it looks to me like the algo is
different than described in the paper. is
Hi,
there is one cost-based analyzer implemented in Spark SQL, if I'm not
mistaken, regarding the Join operations,
If the join operation is done with a small dataset then Spark SQL's
strategy will be to broadcast automatically the small dataset instead of
shuffling.
I guess you have something
Yes - It is still in progress, but I have just not gotten time to get to
this. I think getting the repo moved from mesos to amplab in the codebase
by 1.5 should be possible.
Thanks
Shivaram
On Fri, Jul 31, 2015 at 3:08 AM, Sean Owen so...@cloudera.com wrote:
PS is this still in progress? it
Hi everyone,
I'm wondering that is there any plan to implement cost-based optimizer for
Spark SQL?
Best regards...
--
*BURAK ISIKLI* | *http://burakisikli.wordpress.com
http://burakisikli.wordpress.com*
I try to enable Tungsten with Spark SQL and set below 3 parameters, but i
found the Spark SQL always hang below point. So could you please point me
what's the potential cause ? I'd appreciate any input.
spark.shuffle.manager=tungsten-sort
spark.sql.codegen=true
spark.sql.unsafe.enabled=true
Dear Spark Dev Community,
I am wondering if there is already a function to solve my problem. If not,
then should I work on this?
Say you just want to check if a word exists in a huge text file. I could
not find better ways than those mentioned here
Hi,
the RDD class does not have an exist()-method (in the Scala API), but
the functionality you need seems easy to resemble with the existing methods:
val containsNMatchingElements =
data.filter(qualifying_function).take(n).count() = n
Note: I am not sure whether the intermediate take(n) really
Another error:
15/07/31 16:15:28 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send
map output locations for shuffle 3 to bignode1:40443
15/07/31 16:15:28 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 3 is 583 bytes
15/07/31 16:15:28 INFO
Is this deterministically reproducible? Can you try this on the latest
master branch?
Would be great to turn debug logging and and dump the generated code. Also
would be great to dump the array size at your line 314 in UnsafeRow (and
whatever master branch's appropriate line is).
On Fri, Jul 31,
Hello !
You could try something like that :
def exists[T](rdd:RDD[T])(f:T=Boolean, n:Int):Boolean = {
rdd.filter(f).countApprox(timeout = 1).getFinalValue().low n
}
If would work for large datasets and large value of n.
Have a nice day,
Jonathan
On 31 July 2015 at 11:29, Carsten
PS is this still in progress? it feels like something that would be
good to do before 1.5.0, if it's going to happen soon.
On Wed, Jul 22, 2015 at 6:59 AM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
Yeah I'll send a note to the mesos dev list just to make sure they are
informed.
14 matches
Mail list logo