Re: Optimize encoding/decoding strings when using Parquet
I have put in a PR on Parquet to support dictionaries when filters are pushed down, which should reduce binary conversion overhear when Spark pushes down string predicates on columns that are dictionary encoded. https://github.com/apache/incubator-parquet-mr/pull/117 It's blocked at the moment as I part of my parquet build fails on my Mac due to issue getting thrift 0.7 installed. Installation instructions available on Parquet do not seem to work I think due to this issue https://issues.apache.org/jira/browse/THRIFT-2229 https://issues.apache.org/jira/browse/THRIFT-2229. This is not directly related to Spark but I wondered if anyone has got thrift 0.7 working on Mac Yosemite 10.0, or can suggest a work round. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Optimize-encoding-decoding-strings-when-using-Parquet-tp10141p10617.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Caching tables at column level
Thanks - we have tried this and it works nicely. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Caching-tables-at-column-level-tp10377p10618.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Why a program would receive null from send message of mapReduceTriplets
I have a question: *How could the attributes of triplets of a graph get update after mapVertices() func? * My code ``` // Initial the graph, assign a counter to each vertex that contains the vertex id only var anfGraph = graph.mapVertices { case (vid, _) = val counter = new HyperLogLog(5) counter.offer(vid) counter } val nullVertex = anfGraph.triplets.filter(edge = edge.srcAttr == null).first anfGraph.vertices.filter(_._1 == nullVertex).first // I could see that the vertex has a not null attribute // messages = anfGraph.aggregateMessages(msgFun, mergeMessage) // - NullPointerException ``` I could found that some vertex attributes in some triplets are null, but not all. Alcaid 2015-02-13 14:50 GMT+08:00 Reynold Xin r...@databricks.com: Then maybe you actually had a null in your vertex attribute? On Thu, Feb 12, 2015 at 10:47 PM, James alcaid1...@gmail.com wrote: I changed the mapReduceTriplets() func to aggregateMessages(), but it still failed. 2015-02-13 6:52 GMT+08:00 Reynold Xin r...@databricks.com: Can you use the new aggregateNeighbors method? I suspect the null is coming from automatic join elimination, which detects bytecode to see if you need the src or dst vertex data. Occasionally it can fail to detect. In the new aggregateNeighbors API, the caller needs to explicitly specifying that, making it more robust. On Thu, Feb 12, 2015 at 6:26 AM, James alcaid1...@gmail.com wrote: Hello, When I am running the code on a much bigger size graph, I met NullPointerException. I found that is because the sendMessage() function receive a triplet that edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen as I am sure every vertices have a attr. Any returns is appreciated. Alcaid 2015-02-11 19:30 GMT+08:00 James alcaid1...@gmail.com: Hello, Recently I am trying to estimate the average distance of a big graph using spark with the help of [HyperAnf]( http://dl.acm.org/citation.cfm?id=1963493). It works like Connect Componenet algorithm, while the attribute of a vertex is a HyperLogLog counter that at k-th iteration it estimates the number of vertices it could reaches less than k hops. I have successfully run the code on a graph with 20M vertices. But I still need help: *I think the code could work more efficiently especially the Send message function, but I am not sure about what will happen if a vertex receive no message at a iteration.* Here is my code: https://github.com/alcaid1801/Erdos Any returns is appreciated.
FW: Trouble posting to the list
FYI ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Dima Zhiyanov dimazhiya...@hotmail.com Date: Thursday, February 12, 2015 at 7:04 AM To: user-ow...@spark.apache.org user-ow...@spark.apache.org Subject: Trouble posting to the list Hello After numerous attempts I am still unable to post to the list. After I click Subscribe I do not get an e-mail which allows me to confirm my subscription. Could you please add me manually? Thanks a lot Dima Sent from my iPhone - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org