Understanding reported times on the Spark UI [+ Streaming]

2014-12-08 Thread Gerard Maas
Hi, I'm confused about the Stage times reported on the Spark-UI (Spark 1.1.0) for an Spark-Streaming job. I'm hoping somebody can shine some light on it: Let's do this with an example: On the /stages page, stage # 232 is reported to have lasted 18 seconds: 232runJob at RDDFunctions.scala:23

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-08 Thread Michael Armbrust
This is by hive's design. From the Hive documentation: The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition. On Sat, Dec 6, 2014 at 8:28 PM, Jianshi

Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We have quite a few PRs that haven't been touched in 3-5

Re: Handling stale PRs

2014-12-08 Thread Ganelin, Ilya
Thank you for pointing this out, Nick. I know that for myself and my colleague who are starting to contribute to Spark, it¹s definitely discouraging to have fixes sitting in the pipeline. Could you recommend any other ways that we can facilitate getting these PRs accepted? Clean, well-tested code

Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
Things that help: - Be persistent. People are busy, so just ping them if there’s been no response for a couple of weeks. Hopefully, as the project continues to develop, this will become less necessary. - Only ping reviewers after test results are back from Jenkins. Make sure all

Re: scala.MatchError on SparkSQL when creating ArrayType of StructType

2014-12-08 Thread Yin Huai
Seems you hit https://issues.apache.org/jira/browse/SPARK-4245. It was fixed in 1.2. Thanks, Yin On Wed, Dec 3, 2014 at 11:50 AM, invkrh inv...@gmail.com wrote: Hi, I am using SparkSQL on 1.1.0 branch. The following code leads to a scala.MatchError at

Re: CREATE TABLE AS SELECT does not work with temp tables in 1.2.0

2014-12-08 Thread Michael Armbrust
This is merged now and should be fixed in the next 1.2 RC. On Sat, Dec 6, 2014 at 8:28 PM, Cheng, Hao hao.ch...@intel.com wrote: I've created(reused) the PR https://github.com/apache/spark/pull/3336, hopefully we can fix this regression. Thanks for the reporting. Cheng Hao -Original

Re: Hive Problem in Pig generated Parquet file schema in CREATE EXTERNAL TABLE (e.g. bag::col1)

2014-12-08 Thread Jianshi Huang
Ah... I see. Thanks for pointing it out. Then it means we cannot mount external table using customized column names. hmm... Then the only option left is to use a subquery to add a bunch of column alias. I'll try it later. Thanks, Jianshi On Tue, Dec 9, 2014 at 3:34 AM, Michael Armbrust