ISDATE Function
Hi Everyone, In MSSQL server suppprt "ISDATE()" function is used to fine current column values date or not?. Is any possible to achieve current column values date or not? Regards, Ravi
pagination spark sq
Hi everyone, Can you please share optimized query for pagination spark sql? In Ms SQL Server, They have supported "offset" method query for specific row selection. Please find the following query Select BusinessEntityID,[FirstName], [LastName],[JobTitle] from HumanResources.vEmployee Order By BusinessEntityID --OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY Is this support OFFSET method in spark sql? Kindly share the useful details. Regards, Ravi
Having Clause with variation and stddev
Hi, Exception thrown when using Having Clause with variation or stddev. It works perfectly when using other aggregate functions(Like sum,count,min,max..) SELECT SUM(1) AS `sum_number_of_records_ok` FROM `some_db`.`some_table` `some_table` GROUP BY 1 HAVING (STDDEV(1) 0) SELECT SUM(1) AS `sum_number_of_records_ok` FROM `some_db`.`some_table` `some_table` GROUP BY 1 HAVING (VARIANCE(1) 0) Could you please share any other way for using this kind query? Regards, Ravi
Exception in spark
Hi all, I got an exception like “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object” when using some where condition queries. I am using 1.4.0 version spark. But its perfectly working in hive . Please refer the following query. I have applied date time casting in where clause. Could please help to resolve this exception? SELECT from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd') AS `leavedetails_created` FROM `default`.`leavedetails` AS leavedetails WHERE (( (from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd') IN ('2015/01/01','2015/01/02')) AND (NOT(`leavedetails`.`created` IS NULL GROUP BY from_unixtime(unix_timestamp(`leavedetails`.`created`,'MMdd'),'/MM/dd') Regards. Ravi
Re: Exception in spark
Hi Josh Please ignore the last mail stack trace. Kindly refer the exception details. {org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: 'Sheet1.Teams} Regards, Ravi On Wed, Aug 12, 2015 at 1:34 AM, Ravisankar Mani rrav...@gmail.com wrote: Hi Rosan, Thanks for your response. Kindly refer the following query and stack trace. I have checked same query in hive, It works perfectly. In case i have removed in in where class, it works in spark SELECT If(ISNOTNULL(SUM(`Sheet1`.`Runs`)),SUM(`Sheet1`.`Runs`),0) AS `Sheet1_Runs` ,`Sheet1`.`Teams` AS `Sheet1_Teams` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` INNER JOIN (SELECT `Sheet1`.`Teams` AS `Teams_0`,If(ISNOTNULL(COUNT(`Sheet1`.`Teams`)),COUNT(`Sheet1`.`Teams`),0) AS `Measure_0` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND (`Sheet1`.`Teams` '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY `Sheet1`.`Teams` ORDER BY `Measure_0` DESC LIMIT 5 ) `T_0` ON ( `Sheet1`.`Teams` =`T_0`.`Teams_0` ) WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND (`Sheet1`.`Teams` '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY `Sheet1`.`Teams` at Syncfusion.ThriftHive.Base.HqlCommand.ExecuteReader() at Syncfusion.Dashboard.Base.Data.HiveQueryBuilder.ExecuteReaderQuery(String query, String connectionString) in f:\Back_To_Svn\source\base\dashboard.base\src\Data\HiveServer.cs:line 409 at Syncfusion.Dashboard.Base.Data.ServerDataProvider.GetTable(String tableName, List`1 schemaInfoCollection, List`1 originalSchemaInfoCollection, List`1 initialFilterList, List`1 viewerFilterList, Boolean isSelectQueryForServerModeFilterPopup, Boolean isNestedFilter) in f:\Back_To_Svn\source\base\dashboard.base\src\Data\Data.cs:line 536 at Syncfusion.Dashboard.Base.Engine.RelationalEngine.GetDataFromMainSource(List`1 schemaInfos, List`1 viewerFilterList) in f:\Back_To_Svn\source\base\dashboard.base\src\Engine\RelationDataEngine.cs:line 902 at DashboardService.DashboardService.ApplyFilterServerMode(RelationalEngine engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties, Boolean isParamsColumn, SchemaInfo paramsSchema) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 1030 at DashboardService.DashboardService.ApplyFilter(RelationalEngine engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties, Boolean isParamsColumn, SchemaInfo paramsSchema) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 1047 at DashboardService.DashboardService.GetProcessedControlData(List`1 elements, RelationalEngine dataEngine, DashboardItem control, RelationalReport currentReport) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 205 at DashboardService.DashboardService.GetGridControlData(DashboardItem controlObj, List`1 columnList, List`1 gridSchemaInfos, Dictionary`2 barData, Dictionary`2 formattedColumns, List`1 colourSaturationColumns) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line 115 at DashboardService.DashboardService.GetGridData(DashboardItem controlObj, Boolean isLoadGrid) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line 47 at DashboardService.DashboardService.ControlConfigHelper(DashboardItem control, Boolean IsLoad, DataFetcherGrid fn) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\DashboardService.svc.cs:line 1269 Regards, Ravi On Tue, Aug 11, 2015 at 11:53 PM, Josh Rosen rosenvi...@gmail.com wrote: Can you share a query or stack trace? More information would make this question easier to answer. On Tue, Aug 11, 2015 at 8:50 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi all, We got an exception like “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object” when using some where condition queries. I am using 1.4.0 version spark. Is this exception resolved in latest spark? Regards, Ravi
Re: Exception in spark
Hi Rosan, Thanks for your response. Kindly refer the following query and stack trace. I have checked same query in hive, It works perfectly. In case i have removed in in where class, it works in spark SELECT If(ISNOTNULL(SUM(`Sheet1`.`Runs`)),SUM(`Sheet1`.`Runs`),0) AS `Sheet1_Runs` ,`Sheet1`.`Teams` AS `Sheet1_Teams` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` INNER JOIN (SELECT `Sheet1`.`Teams` AS `Teams_0`,If(ISNOTNULL(COUNT(`Sheet1`.`Teams`)),COUNT(`Sheet1`.`Teams`),0) AS `Measure_0` FROM default.Dashboard_Sheet1_20150812010131201 AS `Sheet1` WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND (`Sheet1`.`Teams` '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY `Sheet1`.`Teams` ORDER BY `Measure_0` DESC LIMIT 5 ) `T_0` ON ( `Sheet1`.`Teams` =`T_0`.`Teams_0` ) WHERE `Sheet1`.`Teams` IN ('Chennai Super Kings')) AND (`Sheet1`.`Teams` '' )) AND (NOT(`Sheet1`.`Teams` IS NULL GROUP BY `Sheet1`.`Teams` at Syncfusion.ThriftHive.Base.HqlCommand.ExecuteReader() at Syncfusion.Dashboard.Base.Data.HiveQueryBuilder.ExecuteReaderQuery(String query, String connectionString) in f:\Back_To_Svn\source\base\dashboard.base\src\Data\HiveServer.cs:line 409 at Syncfusion.Dashboard.Base.Data.ServerDataProvider.GetTable(String tableName, List`1 schemaInfoCollection, List`1 originalSchemaInfoCollection, List`1 initialFilterList, List`1 viewerFilterList, Boolean isSelectQueryForServerModeFilterPopup, Boolean isNestedFilter) in f:\Back_To_Svn\source\base\dashboard.base\src\Data\Data.cs:line 536 at Syncfusion.Dashboard.Base.Engine.RelationalEngine.GetDataFromMainSource(List`1 schemaInfos, List`1 viewerFilterList) in f:\Back_To_Svn\source\base\dashboard.base\src\Engine\RelationDataEngine.cs:line 902 at DashboardService.DashboardService.ApplyFilterServerMode(RelationalEngine engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties, Boolean isParamsColumn, SchemaInfo paramsSchema) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 1030 at DashboardService.DashboardService.ApplyFilter(RelationalEngine engine, String ReportName, Boolean Drilled, Boolean useDefaultProperties, Boolean isParamsColumn, SchemaInfo paramsSchema) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 1047 at DashboardService.DashboardService.GetProcessedControlData(List`1 elements, RelationalEngine dataEngine, DashboardItem control, RelationalReport currentReport) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Common\DataProcessor.cs:line 205 at DashboardService.DashboardService.GetGridControlData(DashboardItem controlObj, List`1 columnList, List`1 gridSchemaInfos, Dictionary`2 barData, Dictionary`2 formattedColumns, List`1 colourSaturationColumns) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line 115 at DashboardService.DashboardService.GetGridData(DashboardItem controlObj, Boolean isLoadGrid) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\Controls\DashboardGrid.cs:line 47 at DashboardService.DashboardService.ControlConfigHelper(DashboardItem control, Boolean IsLoad, DataFetcherGrid fn) in f:\Back_To_Svn\source\js\dashboardsamples\wcf\DashboardService.svc.cs:line 1269 Regards, Ravi On Tue, Aug 11, 2015 at 11:53 PM, Josh Rosen rosenvi...@gmail.com wrote: Can you share a query or stack trace? More information would make this question easier to answer. On Tue, Aug 11, 2015 at 8:50 PM, Ravisankar Mani rrav...@gmail.com wrote: Hi all, We got an exception like “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object” when using some where condition queries. I am using 1.4.0 version spark. Is this exception resolved in latest spark? Regards, Ravi
Exception in spark
Hi all, We got an exception like “org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object” when using some where condition queries. I am using 1.4.0 version spark. Is this exception resolved in latest spark? Regards, Ravi
Hive Query(Top N)
Hi everyone I have tried to to achieve hierarchical based (index mode) top n creation using spark query. it taken more time when i execute following query Select SUM(`adventurepersoncontacts`.`contactid`) AS `adventurepersoncontacts_contactid` , `adventurepersoncontacts`.`fullname` AS `adventurepersoncontacts_fullname` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts JOIN ( SELECT `F_0`.`fullname_0_0` AS `fullname_0_0`, ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC) AS `R_N_0` FROM( SELECT `adventurepersoncontacts`.`fullname` AS `fullname_0_0`, SUM(adventurepersoncontacts.contactid) AS `Measure_0` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts GROUP BY `adventurepersoncontacts`.`fullname` )`F_0` ) `T_0` on ((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` ) AND (`T_0`.`R_N_0` = 5)) GROUP BY `adventurepersoncontacts`.`fullname In mentioned query, I have set row index in every group according to the aggregation type. Row_number calculation according to the aggregation like Row_Number() Over( order by sum( column name) order by DESC) not directly supported in spark . So i have using sub query like SELECT `F_0`.`fullname_0_0` AS `fullname_0_0`, ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC) AS `R_N_0` FROM( SELECT `adventurepersoncontacts`.`fullname` AS `fullname_0_0`, SUM(adventurepersoncontacts.contactid) AS `Measure_0` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts GROUP BY `adventurepersoncontacts`.`fullname` )`F_0` ) `T_0` The execution time was getting slowed when using following way. In case i removed main group by( remove aggregation) method , it getting very fast execution. Refer following query. I mentioned this query only for slow execution when i include main group by Select `adventurepersoncontacts`.`contactid` AS `adventurepersoncontacts_contactid` , `adventurepersoncontacts`.`fullname` AS `adventurepersoncontacts_fullname` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts JOIN ( SELECT `F_0`.`fullname_0_0` AS `fullname_0_0`, ROW_NUMBER() OVER( ORDER BY `F_0`.`Measure_0` DESC) AS `R_N_0` FROM( SELECT `adventurepersoncontacts`.`fullname` AS `fullname_0_0`, SUM(adventurepersoncontacts.contactid) AS `Measure_0` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts GROUP BY `adventurepersoncontacts`.`fullname` )`F_0` ) `T_0` on ((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` ) AND (`T_0`.`R_N_0` = 5)) I have found other way for getting this hierarchical query. I have created a temp table ( The table refers inner sub query(refer previous query) ) CREATE external table IF NOT EXISTS temp_table AS SELECT SUM(adventurepersoncontacts.contactid) as contactid, adventurepersoncontacts.fullname as fullname FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts GROUP BY `adventurepersoncontacts`.`fullname`; SELECT SUM(`adventurepersoncontacts`.`contactid`) AS `adventurepersoncontacts_contactid` ,`adventurepersoncontacts`.`fullname` AS `adventurepersoncontacts_fullname` FROM `default`.`adventurepersoncontacts` AS adventurepersoncontacts JOIN (SELECT `fullname` AS `fullname_0_0`,ROW_NUMBER() over( order by `contactid` desc) AS `R_N_0` FROM default.temp_table) `T_0` on ((`adventurepersoncontacts`.`fullname` = `T_0`.`fullname_0_0` ) AND (`T_0`.`R_N_0` = 2)) GROUP BY `adventurepersoncontacts`.`fullname`; This is other way. But this execution time was delayed when using create table using select statement Please help any other optimized way to achieve my requirement. Regards, Ravi
Udf's in spark
Hi Everyone, As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs. I have built the UDF's in hive meta store. It working perfectly in hive connection. But it is not working in spark (java.lang.RuntimeException: Couldn't find function DATE_FORMAT). Could you please help how to use this hive UDF' s in sprak? Regards, Ravisankar M R
Spark performance
Hi everyone, I have planned to move mssql server to spark?. I have using around 50,000 to 1l records. The spark performance is slow when compared to mssql server. What is the best data base(Spark or sql) to store or retrieve data around 50,000 to 1l records ? regards, Ravi
Performance slow
Hi everyone, More time to be taken when i execute query using (group by + order by) or (group by + cast + order by) in same query. Kindly refer the following query Could you please provide any solution regarding thisd performance issue? SELECT If(ISNOTNULL(SUM(CAST(adventurepersoncontacts.contactid AS decimal(38,6,SUM(CAST(adventurepersoncontacts.contactid AS decimal(38,6))),0) AS adventurepersoncontacts_contactid, adventurepersoncontacts.fullname as adventurepersoncontacts_fullname FROM default.adventurepersoncontacts AS adventurepersoncontacts order by adventurepersoncontacts.fullname asc Regards, Ravisankar M R
Day of year
Hi everyone, I can't get 'day of year' when using spark sql query. Can you help any way to achieve day of year? Regards, Ravi
Spark query
Hi everyone, I can't get 'day of year' when using spark query. Can you help any way to achieve day of year? Regards, Ravi