Unable to see completed application in Spark 2 history web UI

2018-08-07 Thread Fawze Abujaber
Hello Community, I'm using Spark 2.3 and Spark 1.6.0 in my cluster with Cloudera distribution 5.13.0. Both are configured to run on Yarn, but i'm unable to see completed application in Spark2 history server, while in Spark 1.6.0 i did. 1) I checked the HDFS permissions for both folders and both

Re: Split a row into multiple rows Java

2018-08-07 Thread nookala
+-+-++++ | name|otherName|val1|val2|val3| +-+-++++ | bob| b1| 1| 2| 3| |alive| c1| 3| 4| 6| | eve| e1| 7| 8| 9| +-+-++++ I need this to become +-+-++- |

Re: Insert into dynamic partitioned hive/parquet table throws error - Partition spec contains non-partition columns

2018-08-07 Thread Nirav Patel
FYI, it works with static partitioning spark.sql("insert overwrite table mytable PARTITION(P1=1085, P2=164590861) select c1, c2,..cn, P1, P2 from updateTable") On Thu, Aug 2, 2018 at 5:01 PM, Nirav Patel wrote: > I am trying to insert overwrite multiple partitions into existing > partitioned

Updating dynamic partitioned hive table throws error - Partition spec contains non-partition columns

2018-08-07 Thread nirav
I am using spark 2.2.1 and hive2.1. I am trying to insert overwrite multiple partitions into existing partitioned hive/parquet table. Table was created using sparkSession. I have a table 'mytable' with partitions P1 and P2. I have following set on sparkSession object:

Re: Newbie question on how to extract column value

2018-08-07 Thread James Starks
Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")load()

Re: Newbie question on how to extract column value

2018-08-07 Thread Gourav Sengupta
Hi James, It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue. Also in case possible, it is always advisable to use the filter option before fetching the

Newbie question on how to extract column value

2018-08-07 Thread James Starks
I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show() Now I want to perform filter and map function on col_b value. In plain scala it

Dynamic partitioning weird behavior

2018-08-07 Thread Nikolay Skovpin
Hi guys. I was investigating a spark property /spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")/. It works perfectly in local fs, but on s3 i stumbled into a strange behavior. If i don't have a hive table or this table is empty, spark won't save any data into this table with

need workaround around HIVE-11625 / DISTRO-800

2018-08-07 Thread Pranav Agrawal
I am hitting issue, https://issues.cloudera.org/browse/DISTRO-800 (related to https://issues.apache.org/jira/browse/HIVE-11625) I am unable to write empty array of types int or string (array of size 0) into parquet, please assist or suggest workaround for the same. spark version: 2.2.1 AWS EMR: