RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Wang, Daoyuan
Hi Alex, I'll create JIRA SPARK-4985 for date type support in parquet, and SPARK-4987 for timestamp type support. For decimal type, I think we only support decimals that fits in a long. Thanks, Daoyuan -Original Message- From: Alessandro Baretta [mailto:alexbare...@gmail.com] Sent:

Re: Spark 1.2.0 build error

2014-12-29 Thread Sean Owen
It means a test failed but you have not shown the test failure. This would have been logged earlier. You would need to say how you ran tests too. The tests for 1.2.0 pass for me on several common permutations. On Dec 29, 2014 3:22 AM, Naveen Madhire vmadh...@umail.iu.edu wrote: Hi, I am follow

How to become spark developer in jira?

2014-12-29 Thread Jakub Dubovsky
Hi devs,   I'd like to ask what are the procedures/conditions for being assigned a role of a developer on spark jira? My motivation is to be able to assign issues to myself. Only related resource I have found is jira permission scheme [1].   regards   Jakub  [1]

RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Alessandro Baretta
Daoyuan, Thanks for creating the jiras. I need these features by... last week, so I'd be happy to take care of this myself, if only you or someone more experienced than me in the SparkSQL codebase could provide some guidance. Alex On Dec 29, 2014 12:06 AM, Wang, Daoyuan daoyuan.w...@intel.com

Re: How to become spark developer in jira?

2014-12-29 Thread Matei Zaharia
Please ask someone else to assign them for now, and just comment on them that you're working on them. Over time if you contribute a bunch we'll add you to that list. The problem is that in the past, people would assign issues to themselves and never actually work on them, making it confusing

Re: How to become spark developer in jira?

2014-12-29 Thread Jakub Dubovsky
Hi Matei,   that makes sense. Thanks a lot!   Jakub -- Původní zpráva -- Od: Matei Zaharia matei.zaha...@gmail.com Komu: Jakub Dubovsky spark.dubovsky.ja...@seznam.cz Datum: 29. 12. 2014 19:31:57 Předmět: Re: How to become spark developer in jira? Please ask someone else to

Re: Which committers care about Kafka?

2014-12-29 Thread Tathagata Das
Hey all, Some wrap up thoughts on this thread. Let me first reiterate what Patrick said, that Kafka is super super important as it forms the largest fraction of Spark Streaming user base. So we really want to improve the Kafka + Spark Streaming integration. To this end, some of the things that

Re: Which committers care about Kafka?

2014-12-29 Thread Cody Koeninger
Can you give a little more clarification on exactly what is meant by 1. Data rate control If someone wants to clamp the maximum number of messages per RDD partition in my solution, it would be very straightforward to do so. Regarding the holy grail, I'm pretty certain you can't have end-to-end

Re: Unsupported Catalyst types in Parquet

2014-12-29 Thread Michael Armbrust
I'd love to get both of these in. There is some trickiness that I talk about on the JIRA for timestamps since the SQL timestamp class can support nano seconds and I don't think parquet has a type for this. Other systems (impala) seem to use INT96. It would be great to maybe ask on the parquet

Re: Spark 1.2.0 build error

2014-12-29 Thread Naveen Madhire
I am getting The command is too long error. Is there anything which needs to be done. However for the time being I followed the sbt way of buidling spark in IntelliJ. On Mon, Dec 29, 2014 at 3:52 AM, Sean Owen so...@cloudera.com wrote: It means a test failed but you have not shown the test

RE: Build Spark 1.2.0-rc1 encounter exceptions when running HiveContext - Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy

2014-12-29 Thread Andrew Lee
Hi Patrick, I manually hardcoded the hive version to 0.13.1a and it works. It turns out that for some reason, 0.13.1 is being picked up instead of the 0.13.1a version from maven. So my solution was:hardcode the hive.version to 0.13.1a in my case since I am building it against hive 0.13 only, so

Adding third party jars to classpath used by pyspark

2014-12-29 Thread Stephen Boesch
What is the recommended way to do this? We have some native database client libraries for which we are adding pyspark bindings. The pyspark invokes spark-submit. Do we add our libraries to the SPARK_SUBMIT_LIBRARY_PATH ? This issue relates back to an error we have been seeing Py4jError:

Re: Unsupported Catalyst types in Parquet

2014-12-29 Thread Alessandro Baretta
Michael, Actually, Adrian Wang already created pull requests for these issues. https://github.com/apache/spark/pull/3820 https://github.com/apache/spark/pull/3822 What do you think? Alex On Mon, Dec 29, 2014 at 3:07 PM, Michael Armbrust mich...@databricks.com wrote: I'd love to get both of

RE: Which committers care about Kafka?

2014-12-29 Thread Shao, Saisai
Hi Cody, From my understanding rate control is an optional configuration in Spark Streaming and is disabled by default, so user can reach maximum throughput without any configuration. The reason why rate control is so important in streaming processing is that Spark Streaming and other

Re: Adding third party jars to classpath used by pyspark

2014-12-29 Thread Jeremy Freeman
Hi Stephen, it should be enough to include --jars /path/to/file.jar in the command line call to either pyspark or spark-submit, as in spark-submit --master local --jars /path/to/file.jar myfile.py and you can check the bottom of the Web UI’s “Environment tab to make sure the jar gets on

A question about using insert into in rdd foreach in spark 1.2

2014-12-29 Thread evil
Hi All, I have a problem when I try to use insert into in loop, and this is my code def main(args: Array[String]) { //This is an empty table, schema is (Int,String) sqlContext.parquetFile(Data\\Test\\Parquet\\Temp).registerTempTable(temp) //not empty table, schema is (Int,String)

Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
Hi pyspark guys, I have a json file, and its struct like below: {NAME:George, AGE:35, ADD_ID:1212, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:1, INFO:x}, {INTEREST_NO:2, INFO:y}]} {NAME:John, AGE:45, ADD_ID:1213, POSTAL_AREA:1, TIME_ZONE_ID:1, INTEREST:[{INTEREST_NO:2, INFO:x},

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
named tuple degenerate to tuple. *A400.map(lambda i: map(None,i.INTEREST))* === [(u'x', 1), (u'y', 2)] [(u'x', 2), (u'y', 3)] -- View this message in context:

Re: Which committers care about Kafka?

2014-12-29 Thread Cody Koeninger
Assuming you're talking about spark.streaming.receiver.maxRate, I just updated my PR to configure rate limiting based on that setting. So hopefully that's issue 1 sorted. Regarding issue 3, as far as I can tell regarding the odd semantics of stateful or windowed operations in the face of

Problems concerning implementing machine learning algorithm from scratch based on Spark

2014-12-29 Thread danqing0703
Hi all, I am trying to use some machine learning algorithms that are not included in the Mllib. Like Mixture Model and LDA(Latent Dirichlet Allocation), and I am using pyspark and Spark SQL. My problem is: I have some scripts that implement these algorithms, but I am not sure which part I shall

RE: Unsupported Catalyst types in Parquet

2014-12-29 Thread Wang, Daoyuan
By adding a flag in SQLContext, I have modified #3822 to include nanoseconds now. Since passing too many flags is ugly, now I need the whole SQLContext, so that we can put more flags there. Thanks, Daoyuan From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Tuesday, December 30, 2014