Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-11-03 Thread Jean Georges Perrin
Write a UDF?

> On Oct 31, 2017, at 11:48, Aakash Basu  > wrote:
> 
> Hey all,
> 
> Any help in the below please?
> 
> Thanks,
> Aakash.
> 
> 
> -- Forwarded message --
> From: Aakash Basu  >
> Date: Tue, Oct 31, 2017 at 9:17 PM
> Subject: Regarding column partitioning IDs and names as per hierarchical 
> level SparkSQL
> To: user mailto:user@spark.apache.org>>
> 
> 
> Hi all,
> 
> I have to generate a table with Spark-SQL with the following columns -
> 
> 
> Level One Id: VARCHAR(20) NULL
> Level One Name: VARCHAR( 50) NOT NULL
> Level Two Id: VARCHAR( 20) NULL
> Level Two Name: VARCHAR(50) NULL
> Level Thr ee Id: VARCHAR(20) NULL
> Level Thr ee Name: VARCHAR(50) NULL
> Level Four Id: VARCHAR(20) NULL
> Level Four Name: VARCHAR( 50) NULL
> Level Five Id: VARCHAR(20) NULL
> Level Five Name: VARCHAR(50) NULL
> Level Six Id: VARCHAR(20) NULL
> Level Six Name: VARCHAR(50) NULL
> Level Seven Id: VARCHAR( 20) NULL
> Level Seven Name: VARCHAR(50) NULL
> Level Eight Id: VARCHAR( 20) NULL
> Level Eight Name: VARCHAR(50) NULL
> Level Nine Id: VARCHAR(20) NULL
> Level Nine Name: VARCHAR( 50) NULL
> Level Ten Id: VARCHAR(20) NULL
> Level Ten Name: VARCHAR(50) NULL
> 
> My input source has these columns -
> 
> 
> IDDescription ParentID
> 10Great-Grandfather
> 1010  Grandfather 10
> 1010101. Father A 1010
> 1010112. Father B 1010
> 1010124. Father C 1010
> 1010135. Father D 1010
> 1010153. Father E 1010
> 101018Father F1010
> 1010196. Father G 1010
> 101020Father H1010
> 101021Father I1010
> 1010222A. Father J1010
> 10101010  2. Father K 101010
> 
> Like the above, I have ID till 20 digits, which means, I have 10 levels.
> 
> I want to populate the ID and name itself along with all the parents till the 
> root for any particular level, which I am unable to create a concrete logic 
> for.
> 
> Am using this way to fetch respecting levels and populate them in the 
> respective columns but not their parents -
> 
> Present Logic ->
> 
> FinalJoin_DF = spark.sql("select "
>   + "case when length(a.id )/2 = '1' 
> then a.id  else ' ' end as level_one_id, "
> + "case when length(a.id )/2 = '1' then 
> a.desc else ' ' end as level_one_name, "
> + "case when length(a.id )/2 = '2' then 
> a.id  else ' ' end as level_two_id, "
> + "case when length(a.id )/2 = '2' then 
> a.desc else ' ' end as level_two_name, "
>   + "case when length(a.id )/2 = '3' 
> then a.id  else ' ' end as level_three_id, "
>   + "case when length(a.id )/2 = '3' 
> then a.desc else ' ' end as level_three_name, "
>   + "case when length(a.id )/2 = '4' 
> then a.id  else ' ' end as level_four_id, "
>   + "case when length(a.id )/2 = '4' 
> then a.desc else ' ' end as level_four_name, "
>   + "case when length(a.id )/2 = '5' 
> then a.id  else ' ' end as level_five_id, "
>   + "case when length(a.id )/2 = '5' 
> then a.desc else ' ' end as level_five_name, "
>   + "case when length(a.id )/2 = '6' 
> then a.id  else ' ' end as level_six_id, "
> + "case when length(a.id )/2 = '6' then 
> a.desc else ' ' end as level_six_name, "
> + "case when length(a.id )/2 = '7' then 
> a.id  else ' ' end as level_seven_id, "
>   + "case when length(a.id )/2 = '7' 
> then a.desc else ' ' end as level_seven_name, "
>   + "case when length(a.id )/2 = '8' 
> then a.id  else ' ' end as level_eight_id, "
> + "case when length(a.id )/2 = '8' then 
> a.desc else ' ' end as level_eight_name, "
>   + "case when length(a.id )/2 = '9' 
> then a.id  else ' ' end as level_nine_id, "
> + "case when length(a.id )/2 = '9' then 
> a.desc else ' ' end as level_nine_name, "
> + "case when length(a.id )/2 = '10' 
> then a.id  else ' ' end as level_ten_id, "
>   + "case when length(a.id )/2 = '10' 
> then a.desc else ' ' end as level_ten_name "
> + "from CategoryTempTable a")
> 
> 
> Can someone help me in also populating all the parents levels i

Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-11-03 Thread ayan guha
you can use 10 passes over the same dataset and build the data


On Fri, Nov 3, 2017 at 9:48 PM, Jean Georges Perrin 
wrote:

> Write a UDF?
>
> On Oct 31, 2017, at 11:48, Aakash Basu  wrote:
>
> Hey all,
>
> Any help in the below please?
>
> Thanks,
> Aakash.
>
>
> -- Forwarded message --
> From: Aakash Basu 
> Date: Tue, Oct 31, 2017 at 9:17 PM
> Subject: Regarding column partitioning IDs and names as per hierarchical
> level SparkSQL
> To: user 
>
>
> Hi all,
>
> I have to generate a table with Spark-SQL with the following columns -
>
>
> Level One Id: VARCHAR(20) NULL
> Level One Name: VARCHAR( 50) NOT NULL
> Level Two Id: VARCHAR( 20) NULL
> Level Two Name: VARCHAR(50) NULL
> Level Thr ee Id: VARCHAR(20) NULL
> Level Thr ee Name: VARCHAR(50) NULL
> Level Four Id: VARCHAR(20) NULL
> Level Four Name: VARCHAR( 50) NULL
> Level Five Id: VARCHAR(20) NULL
> Level Five Name: VARCHAR(50) NULL
> Level Six Id: VARCHAR(20) NULL
> Level Six Name: VARCHAR(50) NULL
> Level Seven Id: VARCHAR( 20) NULL
> Level Seven Name: VARCHAR(50) NULL
> Level Eight Id: VARCHAR( 20) NULL
> Level Eight Name: VARCHAR(50) NULL
> Level Nine Id: VARCHAR(20) NULL
> Level Nine Name: VARCHAR( 50) NULL
> Level Ten Id: VARCHAR(20) NULL
> Level Ten Name: VARCHAR(50) NULL
>
> My input source has these columns -
>
>
> ID Description ParentID
> 10 Great-Grandfather
> 1010 Grandfather 10
> 101010 1. Father A 1010
> 101011 2. Father B 1010
> 101012 4. Father C 1010
> 101013 5. Father D 1010
> 101015 3. Father E 1010
> 101018 Father F 1010
> 101019 6. Father G 1010
> 101020 Father H 1010
> 101021 Father I 1010
> 101022 2A. Father J 1010
> 10101010 2. Father K 101010
> Like the above, I have ID till 20 digits, which means, I have 10 levels.
>
> I want to populate the ID and name itself along with all the parents till
> the root for any particular level, which I am unable to create a concrete
> logic for.
>
> Am using this way to fetch respecting levels and populate them in the
> respective columns but not their parents -
>
> Present Logic ->
>
> FinalJoin_DF = spark.sql("select "
>   + "case when length(a.id)/2 = '1' then a.id
> else ' ' end as level_one_id, "
>   + "case when length(a.id)/2 = '1' then a.desc else ' ' end as
> level_one_name, "
>   + "case when length(a.id)/2 = '2' then a.id else ' ' end as
> level_two_id, "
>   + "case when length(a.id)/2 = '2' then a.desc else ' ' end as
> level_two_name, "
>   + "case when length(a.id)/2 = '3' then a.id
> else ' ' end as level_three_id, "
>   + "case when length(a.id)/2 = '3' then a.desc
> else ' ' end as level_three_name, "
>   + "case when length(a.id)/2 = '4' then a.id
> else ' ' end as level_four_id, "
>   + "case when length(a.id)/2 = '4' then a.desc
> else ' ' end as level_four_name, "
>   + "case when length(a.id)/2 = '5' then a.id
> else ' ' end as level_five_id, "
>   + "case when length(a.id)/2 = '5' then a.desc
> else ' ' end as level_five_name, "
>   + "case when length(a.id)/2 = '6' then a.id
> else ' ' end as level_six_id, "
>   + "case when length(a.id)/2 = '6' then a.desc else ' ' end as
> level_six_name, "
>   + "case when length(a.id)/2 = '7' then a.id else ' ' end as
> level_seven_id, "
>   + "case when length(a.id)/2 = '7' then a.desc
> else ' ' end as level_seven_name, "
>   + "case when length(a.id)/2 = '8' then a.id
> else ' ' end as level_eight_id, "
>   + "case when length(a.id)/2 = '8' then a.desc else ' ' end as
> level_eight_name, "
>   + "case when length(a.id)/2 = '9' then a.id
> else ' ' end as level_nine_id, "
>   + "case when length(a.id)/2 = '9' then a.desc else ' ' end as
> level_nine_name, "
>   + "case when length(a.id)/2 = '10' then a.id else ' ' end as
> level_ten_id, "
>   + "case when length(a.id)/2 = '10' then a.desc
> else ' ' end as level_ten_name "
>   + "from CategoryTempTable a")
>
>
> Can someone help me in also populating all the parents levels in the
> respective level ID and level name, please?
>
>
> Thanks,
> Aakash.
>
>
>


-- 
Best Regards,
Ayan Guha