Re: Optimize encoding/decoding strings when using Parquet

2015-02-13 Thread Mick Davies
I have put in a PR on Parquet to support dictionaries when filters are pushed
down, which should reduce binary conversion overhear when Spark pushes down
string predicates on columns that are dictionary encoded.

https://github.com/apache/incubator-parquet-mr/pull/117

It's blocked at the moment as I part of my parquet build fails on my Mac due
to issue getting thrift 0.7 installed. Installation instructions available
on Parquet do not seem to work I think due to this issue
https://issues.apache.org/jira/browse/THRIFT-2229
https://issues.apache.org/jira/browse/THRIFT-2229.

This is not directly related to Spark but I wondered if anyone has got
thrift 0.7 working on Mac Yosemite 10.0, or can suggest a work round.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10617.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Caching tables at column level

2015-02-13 Thread Mick Davies
Thanks - we have tried this and it works nicely.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377p10618.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Why a program would receive null from send message of mapReduceTriplets

2015-02-13 Thread James
I have a question:

*How could the attributes of triplets of a graph get update after
mapVertices() func? *

My code

```
// Initial the graph, assign a counter to each vertex that contains the
vertex id only
var anfGraph = graph.mapVertices { case (vid, _) =
  val counter = new HyperLogLog(5)
  counter.offer(vid)
  counter
}

val nullVertex = anfGraph.triplets.filter(edge = edge.srcAttr ==
null).first

anfGraph.vertices.filter(_._1 == nullVertex).first
// I could see that the vertex has a not null attribute

// messages = anfGraph.aggregateMessages(msgFun, mergeMessage)   // -
NullPointerException

```

I could found that some vertex attributes in some triplets are null, but
not all.


Alcaid


2015-02-13 14:50 GMT+08:00 Reynold Xin r...@databricks.com:

 Then maybe you actually had a null in your vertex attribute?


 On Thu, Feb 12, 2015 at 10:47 PM, James alcaid1...@gmail.com wrote:

 I changed the mapReduceTriplets() func to aggregateMessages(), but it
 still failed.


 2015-02-13 6:52 GMT+08:00 Reynold Xin r...@databricks.com:

 Can you use the new aggregateNeighbors method? I suspect the null is
 coming from automatic join elimination, which detects bytecode to see if
 you need the src or dst vertex data. Occasionally it can fail to detect. In
 the new aggregateNeighbors API, the caller needs to explicitly specifying
 that, making it more robust.


 On Thu, Feb 12, 2015 at 6:26 AM, James alcaid1...@gmail.com wrote:

 Hello,

 When I am running the code on a much bigger size graph, I met
 NullPointerException.

 I found that is because the sendMessage() function receive a triplet
 that
 edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen
 as I
 am sure every vertices have a attr.

 Any returns is appreciated.

 Alcaid


 2015-02-11 19:30 GMT+08:00 James alcaid1...@gmail.com:

  Hello,
 
  Recently  I am trying to estimate the average distance of a big graph
  using spark with the help of [HyperAnf](
  http://dl.acm.org/citation.cfm?id=1963493).
 
  It works like Connect Componenet algorithm, while the attribute of a
  vertex is a HyperLogLog counter that at k-th iteration it estimates
 the
  number of vertices it could reaches less than k hops.
 
  I have successfully run the code on a graph with 20M vertices. But I
 still
  need help:
 
 
  *I think the code could work more efficiently especially the Send
  message function, but I am not sure about what will happen if a
 vertex
  receive no message at a iteration.*
 
  Here is my code: https://github.com/alcaid1801/Erdos
 
  Any returns is appreciated.
 







FW: Trouble posting to the list

2015-02-13 Thread Mattmann, Chris A (3980)
FYI

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Dima Zhiyanov dimazhiya...@hotmail.com
Date: Thursday, February 12, 2015 at 7:04 AM
To: user-ow...@spark.apache.org user-ow...@spark.apache.org
Subject: Trouble posting to the list

Hello

After numerous attempts I am still unable to post to the list. After I
click Subscribe I do not get an e-mail which allows me to confirm my
subscription. Could you please add me manually?

Thanks a lot
Dima

Sent from my iPhone


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org