Re: df.dtypes -> pyspark.sql.types

2016-03-19 Thread Ruslan Dautkhanov
/d57daf1f7732a7ac54a91fe112deeda0a254f9ef/python/pyspark/sql/types.py -- Ruslan Dautkhanov On Wed, Mar 16, 2016 at 4:44 PM, Reynold Xin <r...@databricks.com> wrote: > We probably should have the alias. Is this still a problem on master > branch? > > On Wed, Mar 16, 2016 at 9:40 AM, Ruslan D

Re: df.dtypes -> pyspark.sql.types

2016-03-19 Thread Ruslan Dautkhanov
r: Could not parse datatype: bigint Looks like pyspark.sql.types doesn't know anything about bigint.. Should it be aliased to LongType in pyspark.sql.types? Thanks On Wed, Mar 16, 2016 at 10:18 AM, Ruslan Dautkhanov <dautkha...@gmail.com> wrote: > Hello, > > Looking at > &

df.dtypes -> pyspark.sql.types

2016-03-19 Thread Ruslan Dautkhanov
ntegerType() for "integer" etc? If it doesn't exist it would be great to have such a mapping function. Thank you. ps. I have a data frame, and use its dtypes to loop through all columns to fix a few columns' data types as a workaround for SPARK-13866. -- Ruslan Dautkhanov

Spark session dies in about 2 days: HDFS_DELEGATION_TOKEN token can't be found

2016-03-11 Thread Ruslan Dautkhanov
Spark session dies out after ~40 hours when running against Hadoop Secure cluster. spark-submit has --principal and --keytab so kerberos ticket renewal works fine according to logs. Some happens with HDFS dfs connection? These messages come up every 1 second: See complete stack:

binary file deserialization

2016-03-09 Thread Ruslan Dautkhanov
is known and well documented. -- Ruslan Dautkhanov

Re: Spark + Sentry + Kerberos don't add up?

2016-02-24 Thread Ruslan Dautkhanov
Turns to be it is a Spark issue https://issues.apache.org/jira/browse/SPARK-13478 -- Ruslan Dautkhanov On Mon, Jan 18, 2016 at 4:25 PM, Ruslan Dautkhanov <dautkha...@gmail.com> wrote: > Hi Romain, > > Thank you for your response. > > Adding Kerberos support might be

spark.storage.memoryFraction for shuffle-only jobs

2016-02-04 Thread Ruslan Dautkhanov
For a Spark job that only does shuffling (e.g. Spark SQL with joins, group bys, analytical functions, order bys), but no explicit persistent RDDs nor dataframes (there are no .cache()es in the code), what would be the lowest recommended setting for spark.storage.memoryFraction?

Re: Hive on Spark knobs

2016-01-29 Thread Ruslan Dautkhanov
Yep, I tried that. It seems you're right. Got an error that execution engine has to be set to mr. hive.execution.engine = mr I did not keep exact error message/stack. It's probably disabled explicitly. -- Ruslan Dautkhanov On Thu, Jan 28, 2016 at 7:03 AM, Todd <bit1...@163.com> wrote:

Hive on Spark knobs

2016-01-27 Thread Ruslan Dautkhanov
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started There are quite a lot of knobs to tune for Hive on Spark. Above page recommends following settings: mapreduce.input.fileinputformat.split.maxsize=75000 > hive.vectorized.execution.enabled=true >

Re: Spark + Sentry + Kerberos don't add up?

2016-01-20 Thread Ruslan Dautkhanov
I took liberty and created a JIRA https://github.com/cloudera/livy/issues/36 Feel free to close it if doesn't belong to Livy project. I really don't know if this is a Spark or a Livy/Sentry problem. Any ideas for possible workarounds? Thank you. -- Ruslan Dautkhanov On Mon, Jan 18, 2016

Re: Spark + Sentry + Kerberos don't add up?

2016-01-18 Thread Ruslan Dautkhanov
() hence the error. So Sentry isn't compatible with Spark in kerberized clusters? Is any workaround for this problem? -- Ruslan Dautkhanov On Mon, Jan 18, 2016 at 3:52 PM, Romain Rigaux <rom...@cloudera.com> wrote: > Livy does not support any Kerberos yet > https://issues.cloudera

Spark + Sentry + Kerberos don't add up?

2016-01-17 Thread Ruslan Dautkhanov
d to impersonate to other users. So very convenient for Spark Notebooks. Any information to help solve this will be highly appreciated. -- Ruslan Dautkhanov

livy test problem: Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project livy-spark_2.10: There are test failures

2016-01-14 Thread Ruslan Dautkhanov
Livy build test from master fails with below problem. Can't track it down. YARN shows Livy Spark yarn application as running. Although attempt to connect to application master shows connection refused: HTTP ERROR 500 > Problem accessing /proxy/application_1448640910222_0046/. Reason: >

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Ruslan Dautkhanov
Spark > 1.3.1 does not provide integration with Phoenix for kerberized cluster. > > Can anybody confirm whether Spark 1.3.1 supports Phoenix on secured > cluster or not? > > Thanks, > Akhilesh > > On Tue, Dec 8, 2015 at 2:57 AM, Ruslan Dautkhanov <dautkha...@gmail.com&g

Re: SparkSQL AVRO

2015-12-07 Thread Ruslan Dautkhanov
-table-in-hive/34059289#34059289 -- Ruslan Dautkhanov On Mon, Dec 7, 2015 at 11:27 AM, Test One <t...@cksworks.com> wrote: > I'm using spark-avro with SparkSQL to process and output avro files. My > data has the following schema: > > root > |-- memberUuid: st

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Ruslan Dautkhanov
) kerberos ticket for authentication to pass. -- Ruslan Dautkhanov On Mon, Dec 7, 2015 at 12:54 PM, Akhilesh Pathodia < pathodia.akhil...@gmail.com> wrote: > Hi, > > I am running spark job on yarn in cluster mode in secured cluster. I am > trying to run Spark on Hbase using Phoenix, b

Re: question about combining small parquet files

2015-11-26 Thread Ruslan Dautkhanov
An interesting compaction approach of small files is discussed recently http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ AFAIK Spark supports views too. -- Ruslan Dautkhanov On Thu, Nov 26, 2015 at 10:43 AM, Nezih Yigitbasi < nyig

Re: Spark REST Job server feedback?

2015-11-25 Thread Ruslan Dautkhanov
java#welcome-to-livy-the-rest-spark-server> " Although that post is from April 2015, not sure if it's still accurate. -- Ruslan Dautkhanov On Thu, Nov 26, 2015 at 12:04 AM, Deenar Toraskar <deenar.toras...@gmail.com > wrote: > Hi > > I had the same question. Anyone havi

Re: Data in one partition after reduceByKey

2015-11-25 Thread Ruslan Dautkhanov
more even distriubution you could use a hash function from that not just a remainder. -- Ruslan Dautkhanov On Mon, Nov 23, 2015 at 6:35 AM, Patrick McGloin <mcgloin.patr...@gmail.com> wrote: > I will answer my own question, since I figured it out. Here is my answer > in case any

Re: ISDATE Function

2015-11-18 Thread Ruslan Dautkhanov
You could write your own UDF isdate(). -- Ruslan Dautkhanov On Tue, Nov 17, 2015 at 11:25 PM, Ravisankar Mani <rrav...@gmail.com> wrote: > Hi Ted Yu, > > Thanks for your response. Is any other way to achieve in Spark Query? > > > Regards, > Ravi > > On Tue,

Re: kerberos question

2015-11-06 Thread Ruslan Dautkhanov
thought it's primary use is for Hue and similar services which uses impersonation quite heavily in kerberized cluster. -- Ruslan Dautkhanov On Wed, Nov 4, 2015 at 1:40 PM, Ted Yu <yuzhih...@gmail.com> wrote: > 2015-11-04 10:03:31,905 ERROR [Delegation Token Refresh Thread-0] > hdfs.KeyP

Re: Pivot Data in Spark and Scala

2015-10-30 Thread Ruslan Dautkhanov
https://issues.apache.org/jira/browse/SPARK-8992 Should be in 1.6? -- Ruslan Dautkhanov On Thu, Oct 29, 2015 at 5:29 AM, Ascot Moss <ascot.m...@gmail.com> wrote: > Hi, > > I have data as follows: > > A, 2015, 4 > A, 2014, 12 > A, 2013, 1 > B, 2015, 24 >

save DF to JDBC

2015-10-05 Thread Ruslan Dautkhanov
/apache/spark/sql/SQLContext.html and can't find anything relevant. Thanks! -- Ruslan Dautkhanov

Re: save DF to JDBC

2015-10-05 Thread Ruslan Dautkhanov
Thank you Richard and Matthew. DataFrameWriter first appeared in Spark 1.4. Sorry, I should have mentioned earlier, we're on CDH 5.4 / Spark 1.3. No options for this version? Best regards, Ruslan Dautkhanov On Mon, Oct 5, 2015 at 4:00 PM, Richard Hillegas <rhil...@us.ibm.com> wrote:

Re: Spark data type guesser UDAF

2015-09-21 Thread Ruslan Dautkhanov
). -- Ruslan Dautkhanov On Thu, Sep 17, 2015 at 12:32 PM, Ruslan Dautkhanov <dautkha...@gmail.com> wrote: > Wanted to take something like this > > https://github.com/fitzscott/AirQuality/blob/master/HiveDataTypeGuesser.java > and create a Hive UDAF to create an aggregate fun

Re: NGINX + Spark Web UI

2015-09-17 Thread Ruslan Dautkhanov
Similar setup for Hue http://gethue.com/using-nginx-to-speed-up-hue-3-8-0/ Might give you an idea. -- Ruslan Dautkhanov On Thu, Sep 17, 2015 at 9:50 AM, mjordan79 <renato.per...@gmail.com> wrote: > Hello! > I'm trying to set up a reverse proxy (using nginx) for the Spark Web UI

Spark data type guesser UDAF

2015-09-17 Thread Ruslan Dautkhanov
Wanted to take something like this https://github.com/fitzscott/AirQuality/blob/master/HiveDataTypeGuesser.java and create a Hive UDAF to create an aggregate function that returns a data type guess. Am I inventing a wheel? Does Spark have something like this already built-in? Would be very useful

Re: Spark ANN

2015-09-15 Thread Ruslan Dautkhanov
Thank you Alexander. Sounds like quite a lot of good and exciting changes slated for Spark's ANN. Looking forward to it. -- Ruslan Dautkhanov On Wed, Sep 9, 2015 at 7:10 PM, Ulanov, Alexander <alexander.ula...@hpe.com> wrote: > Thank you, Feynman, this is helpful. The paper that

Re: Best way to import data from Oracle to Spark?

2015-09-10 Thread Ruslan Dautkhanov
Sathish, Thanks for pointing to that. https://docs.oracle.com/cd/E57371_01/doc.41/e57351/copy2bda.htm That must be only part of Oracle's BDA codebase, not open-source Hive, right? -- Ruslan Dautkhanov On Thu, Sep 10, 2015 at 6:59 AM, Sathish Kumaran Vairavelu < vsathishkuma...@gmail.

Re: Avoiding SQL Injection in Spark SQL

2015-09-10 Thread Ruslan Dautkhanov
s point is more relevant for OLTP-like queries which Spark is probably not yet good at (e.g. return a few rows quickly/ winthin a few ms). -- Ruslan Dautkhanov On Thu, Sep 10, 2015 at 12:07 PM, Michael Armbrust <mich...@databricks.com> wrote: > Either that or use the DataFrame API, wh

Re: Best way to import data from Oracle to Spark?

2015-09-08 Thread Ruslan Dautkhanov
You can also sqoop oracle data in $ sqoop import --connect jdbc:oracle:thin:@localhost:1521/orcl --username MOVIEDEMO --password welcome1 --table ACTIVITY http://www.rittmanmead.com/2014/03/using-sqoop-for-loading-oracle-data-into-hadoop-on-the-bigdatalite-vm/ -- Ruslan Dautkhanov On Tue

Spark ANN

2015-09-07 Thread Ruslan Dautkhanov
http://people.apache.org/~pwendell/spark-releases/latest/ml-ann.html Implementation seems missing backpropagation? Was there is a good reason to omit BP? What are the drawbacks of a pure feedforward-only ANN? Thanks! -- Ruslan Dautkhanov

Re: Spark ANN

2015-09-07 Thread Ruslan Dautkhanov
/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/BreezeUtil.scala#L43 should read B :) -- Ruslan Dautkhanov On Mon, Sep 7, 2015 at 12:47 PM, Feynman Liang <fli...@databricks.com> wrote: > Backprop is used to compute the gradient here > <https://github.com/apache/sp

Re: Spark ANN

2015-09-07 Thread Ruslan Dautkhanov
Found a dropout commit from avulanov: https://github.com/avulanov/spark/commit/3f25e26d10ef8617e46e35953fe0ad1a178be69d It probably hasn't made its way to MLLib (yet?). -- Ruslan Dautkhanov On Mon, Sep 7, 2015 at 8:34 PM, Feynman Liang <fli...@databricks.com> wrote: > Unfortunately

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
Read response from Cheng Lian <lian.cs@gmail.com> on Aug/27th - it looks the same problem. Workarounds 1. write that parquet file in Spark; 2. upgrade to Spark 1.5. -- Ruslan Dautkhanov On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov <ale...@gmail.com> wrote: > No, it was

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
That parquet table wasn't created in Spark, is it? There was a recent discussion on this list that complex data types in Spark prior to 1.5 often incompatible with Hive for example, if I remember correctly. On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov wrote: > I am trying to read

Re: Ranger-like Security on Spark

2015-09-03 Thread Ruslan Dautkhanov
est/topics/sg_hdfs_sentry_sync.html -- Ruslan Dautkhanov On Thu, Sep 3, 2015 at 1:46 PM, Daniel Schulz <danielschulz2...@hotmail.com> wrote: > Hi Matei, > > Thanks for your answer. > > My question is regarding simple authenticated Spark-on-YARN only, without > Ker

Re: FAILED_TO_UNCOMPRESS error from Snappy

2015-08-20 Thread Ruslan Dautkhanov
https://issues.apache.org/jira/browse/SPARK-7660 ? -- Ruslan Dautkhanov On Thu, Aug 20, 2015 at 1:49 PM, Kohki Nishio tarop...@gmail.com wrote: Right after upgraded to 1.4.1, we started seeing this exception and yes we picked up snappy-java-1.1.1.7 (previously snappy-java-1.1.1.6

Re: Spark Master HA on YARN

2015-08-16 Thread Ruslan Dautkhanov
There is no Spark master in YARN mode. It's standalone mode terminology. In YARN cluster mode, Spark's Application Master (Spark Driver runs in it) will be restarted automatically by RM up to yarn.resourcemanager.am.max-retries times (default is 2). -- Ruslan Dautkhanov On Fri, Jul 17, 2015 at 1

Re: Spark job workflow engine recommendations

2015-08-11 Thread Ruslan Dautkhanov
for Spark? -- Ruslan Dautkhanov On Tue, Aug 11, 2015 at 11:30 AM, Hien Luu h...@linkedin.com.invalid wrote: We are in the middle of figuring that out. At the high level, we want to combine the best parts of existing workflow solutions. On Fri, Aug 7, 2015 at 3:55 PM, Vikram Kone vikramk

Re: collect() works, take() returns ImportError: No module named iter

2015-08-10 Thread Ruslan Dautkhanov
. -- Ruslan Dautkhanov On Mon, Aug 10, 2015 at 3:53 PM, YaoPau jonrgr...@gmail.com wrote: I'm running Spark 1.3 on CDH 5.4.4, and trying to set up Spark to run via iPython Notebook. I'm getting collect() to work just fine, but take() errors. (I'm having issues with collect() on other datasets

Re: Spark Number of Partitions Recommendations

2015-08-01 Thread Ruslan Dautkhanov
You should also take into account amount of memory that you plan to use. It's advised not to give too much memory for each executor .. otherwise GC overhead will go up. Btw, why prime numbers? -- Ruslan Dautkhanov On Wed, Jul 29, 2015 at 3:31 AM, ponkin alexey.pon...@ya.ru wrote: Hi Rahul

Re: TCP/IP speedup

2015-08-01 Thread Ruslan Dautkhanov
bandwidth-bound, I can see it'll be a few percent to no improvement. -- Ruslan Dautkhanov On Sat, Aug 1, 2015 at 6:08 PM, Simon Edelhaus edel...@gmail.com wrote: H 2% huh. -- ttfn Simon Edelhaus California 2015 On Sat, Aug 1, 2015 at 3:45 PM, Mark Hamstra m

Re: Is SPARK is the right choice for traditional OLAP query processing?

2015-07-28 Thread Ruslan Dautkhanov
or in-memory columnar storage caching from traditional RDBMS systems, and may get better and/or more predictable performance on BI queries. -- Ruslan Dautkhanov On Mon, Jul 20, 2015 at 6:04 PM, renga.kannan renga.kan...@gmail.com wrote: All, I really appreciate anyone's input on this. We are having

Re: Is IndexedRDD available in Spark 1.4.0?

2015-07-23 Thread Ruslan Dautkhanov
Or Spark on HBase ) http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ -- Ruslan Dautkhanov On Tue, Jul 14, 2015 at 7:07 PM, Ted Yu yuzhih...@gmail.com wrote: bq. that is, key-value stores Please consider HBase for this purpose :-) On Tue, Jul 14, 2015 at 5:55 PM

Re: Spark equivalent for Oracle's analytical functions

2015-07-12 Thread Ruslan Dautkhanov
Should be part of Spark 1.4 https://issues.apache.org/jira/browse/SPARK-1442 I don't see it in the documentation though https://spark.apache.org/docs/latest/sql-programming-guide.html -- Ruslan Dautkhanov On Mon, Jul 6, 2015 at 5:06 AM, gireeshp gireesh.puthum...@augmentiq.in wrote

Re: RECEIVED SIGNAL 15: SIGTERM

2015-07-12 Thread Ruslan Dautkhanov
the executor receives a SIGTERM (from whom???) From YARN Resource Manager. Check if yarn fair scheduler preemption and/or speculative execution are turned on, then it's quite possible and not a bug. -- Ruslan Dautkhanov On Sun, Jul 12, 2015 at 11:29 PM, Jong Wook Kim jongw...@nyu.edu wrote

Re: Does spark supports the Hive function posexplode function?

2015-07-12 Thread Ruslan Dautkhanov
You can see what Spark SQL functions are supported in Spark by doing the following in a notebook: %sql show functions https://forums.databricks.com/questions/665/is-hive-coalesce-function-supported-in-sparksql.html I think Spark SQL support is currently around Hive ~0.11? -- Ruslan

Re: Caching in spark

2015-07-12 Thread Ruslan Dautkhanov
Hi Akhil, It's interesting if RDDs are stored internally in a columnar format as well? Or it is only when an RDD is cached in SQL context, it is converted to columnar format. What about data frames? Thanks! -- Ruslan Dautkhanov On Fri, Jul 10, 2015 at 2:07 AM, Akhil Das ak

Re: .NET on Apache Spark?

2015-07-05 Thread Ruslan Dautkhanov
Scala used to run on .NET http://www.scala-lang.org/old/node/10299 -- Ruslan Dautkhanov On Thu, Jul 2, 2015 at 1:26 PM, pedro ski.rodrig...@gmail.com wrote: You might try using .pipe() and installing your .NET program as a binary across the cluster (or using addFile). Its not ideal to pipe

Re: configuring max sum of cores and memory in cluster through command line

2015-07-05 Thread Ruslan Dautkhanov
://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-1-x/Cloudera-Manager-Managing-Clusters/cm5mc_resource_pools.html -- Ruslan Dautkhanov On Thu, Jul 2, 2015 at 4:20 PM, Alexander Waldin awal...@inflection.com wrote: Hi, I'd like to specify the total sum of cores

Re: Problem after enabling Hadoop native libraries

2015-06-30 Thread Ruslan Dautkhanov
You can run hadoop checknative -a and see if bzip2 is detected correctly. -- Ruslan Dautkhanov On Fri, Jun 26, 2015 at 10:18 AM, Marcelo Vanzin van...@cloudera.com wrote: What master are you using? If this is not a local master, you'll need to set LD_LIBRARY_PATH on the executors also

Re: flume sinks supported by spark streaming

2015-06-23 Thread Ruslan Dautkhanov
https://spark.apache.org/docs/latest/streaming-flume-integration.html Yep, avro sink is the correct one. -- Ruslan Dautkhanov On Tue, Jun 23, 2015 at 9:46 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi! I want to integrate flume with spark streaming. I want to know which sink

Re: [ERROR] Insufficient Space

2015-06-19 Thread Ruslan Dautkhanov
Vadim, You could edit /etc/fstab, then issue mount -o remount to give more shared memory online. Didn't know Spark uses shared memory. Hope this helps. On Fri, Jun 19, 2015, 8:15 AM Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Hello Spark Experts, I've been running a standalone Spark

Re: Does MLLib has attribute importance?

2015-06-18 Thread Ruslan Dautkhanov
Got it. Thanks! -- Ruslan Dautkhanov On Thu, Jun 18, 2015 at 1:02 PM, Xiangrui Meng men...@gmail.com wrote: ChiSqSelector calls an RDD of labeled points, where the label is the target. See https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/feature

Re: Does MLLib has attribute importance?

2015-06-17 Thread Ruslan Dautkhanov
Thank you Xiangrui. Oracle's attribute importance mining function have a target variable. Attribute importance is a supervised function that ranks attributes according to their significance in predicting a target. MLlib's ChiSqSelector does not have a target variable. -- Ruslan Dautkhanov

Does MLLib has attribute importance?

2015-06-11 Thread Ruslan Dautkhanov
in predicting a target. Best regards, Ruslan Dautkhanov

k-means for text mining in a streaming context

2015-06-08 Thread Ruslan Dautkhanov
? Best reagrds, Ruslan Dautkhanov

Re: Spark Job always cause a node to reboot

2015-06-04 Thread Ruslan Dautkhanov
amount of memory node has. -- Ruslan Dautkhanov On Thu, Jun 4, 2015 at 8:59 AM, Chao Chen kandy...@gmail.com wrote: Hi all, I am new to spark. I am trying to deploy HDFS (hadoop-2.6.0) and Spark-1.3.1 with four nodes, and each node has 8-cores and 8GB memory. One is configured as headnode

Re: How to monitor Spark Streaming from Kafka?

2015-06-02 Thread Ruslan Dautkhanov
Nobody mentioned CM yet? Kafka is now supported by CM/CDH 5.4 http://www.cloudera.com/content/cloudera/en/documentation/cloudera-kafka/latest/PDF/cloudera-kafka.pdf -- Ruslan Dautkhanov On Mon, Jun 1, 2015 at 5:19 PM, Dmitry Goldenberg dgoldenberg...@gmail.com wrote: Thank you, Tathagata

Re: Value for SPARK_EXECUTOR_CORES

2015-05-28 Thread Ruslan Dautkhanov
*spark.shuffle.safetyFraction)/ spark.executor.cores. Memory fraction and safety fraction default to 0.2 and 0.8 respectively. I'd test spark.executor.cores with 2,4,8 and 16 and see what makes your job run faster.. -- Ruslan Dautkhanov On Wed, May 27, 2015 at 6:46 PM, Mulugeta Mammo mulugeta.abe

Re: PySpark Logs location

2015-05-21 Thread Ruslan Dautkhanov
application logs. -- Ruslan Dautkhanov On Thu, May 21, 2015 at 5:08 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Doesn't work for me so far , using command but got such output. What should I check to fix the issue? Any configuration parameters ... [root@sdo-hdp-bd-master1 ~]# yarn

Re: PySpark Logs location

2015-05-20 Thread Ruslan Dautkhanov
Oleg, You can see applicationId in your Spark History Server. Go to http://historyserver:18088/ Also check https://spark.apache.org/docs/1.1.0/running-on-yarn.html#debugging-your-application It should be no different with PySpark. -- Ruslan Dautkhanov On Wed, May 20, 2015 at 2:12 PM, Oleg

Re: PySpark Logs location

2015-05-20 Thread Ruslan Dautkhanov
You could use yarn logs -applicationId application_1383601692319_0008 -- Ruslan Dautkhanov On Wed, May 20, 2015 at 5:37 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Hi , I am executing PySpark job on yarn ( hortonworks distribution). Could someone pointing me where is the log

Re: Reading Nested Fields in DataFrames

2015-05-11 Thread Ruslan Dautkhanov
Had the same question on stackoverflow recently http://stackoverflow.com/questions/30008127/how-to-read-a-nested-collection-in-spark Lomig Mégard had a detailed answer of how to do this without using LATERAL VIEW. On Mon, May 11, 2015 at 8:05 AM, Ashish Kumar Singh ashish23...@gmail.com wrote: