Re: Cache Join

2019-03-28 Thread Ilya Kasnacheev
Hello!

Do you really need such huge number of fields to join? Isn't it easier to
introduce an affinity field? Anyway, you seem to create an index with a
huge number of fields. Consider only keeping fields which are necessary for
performing that join. You might even need to boil it down to some single
field (I'm not sure how actual distributedJoins is implemented).

If you could frame it as stand-alone project, somebody from community could
try and tweak it until it works.

Regards,
-- 
Ilya Kasnacheev


чт, 28 мар. 2019 г. в 06:23, Skollur :

> Hello
>
> I am using apache ignite 2.7 version and trying to join three tables and My
> query looks as below. All tables are configured as PARTIONED and also set
> "distributedjoin=true"
> ==
> SELECT
> CAS.Group_Customer_ID as groupCustomerId,
> CAS.Account_Number as accountNumber,
> 'Custody Account' as productTitle,
> CAS.Investment_Model as investmentModel,
> sum(CAS.Market_Value) AS marketValue,
> CSS.Category_Sequence as sequenceNumber,
> sum(ASM.Prev_Month_Bal)  as preMonthBal,
> sum(ASM.Prev_Year_Bal) as preYearBal
>  FROM
> "VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS
> CAS
> INNER JOIN "IdbCpsSequenceCache".IDB_CPS_SEQUENCE CSS ON
> CAS.Account_Type = CSS.Account_Type
> AND CAS.As_Of_Date BETWEEN '2018-12-31' AND '2018-12-31'
> AND CAS.Group_Customer_ID = 61
>  INNER JOIN "InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY ASM ON
> CAS.Group_Customer_ID = ASM.Group_Customer_ID
> AND CAS.Account_Number = ASM.Account_Number
> AND CAS.As_of_Date = ASM.Effective_Date
> AND CAS.Account_Type = ASM.Account_Type
> AND CAS.Product = ASM.Product
> AND CAS.Asset_Class_Allocation = ASM.Asset_Class
> group by
> CAS.Group_Customer_ID,
> CAS.Account_Number,
> CAS.Investment_Model,
> CSS.Category_Sequence
> =
> I have added index for each fields which are used in each cache table as
> below. Example in InvestmentSummaryMonthlyCache:-
>
>ArrayList indexes = new ArrayList<>();
> QueryIndex index = new QueryIndex();
> index.setName("NonClustered_Index_Summary_Monthly");
> index.setIndexType(QueryIndexType.SORTED);
> LinkedHashMap indFlds = new LinkedHashMap<>();
> indFlds.put("dwId", true);
> indFlds.put("groupCustomerId", true);
> indFlds.put("accountNumber", true);
> indFlds.put("accountType", true);
> indFlds.put("product", true);
> indFlds.put("productType", true);
> indFlds.put("subProduct", true);
> indFlds.put("assetClass", true);
> indFlds.put("effectiveDate", true);
> index.setFields(indFlds);
> indexes.add(index);
> qryEntity.setIndexes(indexes);
> qryEntities.add(qryEntity);
> =
> When I tried to execute query(I have all fields defined in index as above),
> getting an error -> SQL Error [5]: javax.cache.CacheException: Failed
> to
> prepare distributed join query: join condition does not use index
> [joinedCache=InvestmentSummaryMonthlyCache, What am I missing here?
>
>
> plan=SELECT
> CAS__Z0.GROUP_CUSTOMER_ID AS __C0_0,
> CAS__Z0.ACCOUNT_NUMBER AS __C0_1,
> 'Custody Account' AS __C0_2,
> CAS__Z0.INVESTMENT_MODEL AS __C0_3,
> SUM(CAS__Z0.MARKET_VALUE) AS __C0_4,
> CSS__Z1.CATEGORY_SEQUENCE AS __C0_5,
> SUM(ASM__Z2.PREV_MONTH_BAL) AS __C0_6,
> SUM(ASM__Z2.PREV_YEAR_BAL) AS __C0_7
> FROM "VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS
> CAS__Z0
> /*
> "VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS.__SCAN_
> */
> /* WHERE (CAS__Z0.AS_OF_DATE <= DATE '2018-12-31')
> AND ((CAS__Z0.GROUP_CUSTOMER_ID = 80061)
> AND (CAS__Z0.AS_OF_DATE >= DATE '2018-12-31'))
> */
> INNER JOIN "InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY ASM__Z2
> /* batched:broadcast
> "InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY.__SCAN_ */
> ON 1=1
> /* WHERE (CAS__Z0.ACCOUNT_NUMBER = ASM__Z2.ACCOUNT_NUMBER)
> AND ((CAS__Z0.GROUP_CUSTOMER_ID = ASM__Z2.GROUP_CUSTOMER_ID)
> AND ((CAS__Z0.AS_OF_DATE = ASM__Z2.EFFECTIVE_DATE)
> AND ((CAS__Z0.ACCOUNT_TYPE = ASM__Z2.ACCOUNT_TYPE)
> AND ((CAS__Z0.ASSET_CLASS_ALLOCATION = ASM__Z2.ASSET_CLASS)
> AND (CAS__Z0.PRODUCT = ASM__Z2.PRODUCT)
> */
> INNER JOIN "IdbCpsSequenceCache".IDB_CPS_SEQUENCE CSS__Z1
> /* batched:broadcast "IdbCpsSequenceCache".NONCLUSTERED_INDEX_LC:
> ACCOUNT_TYPE = CAS__Z0.ACCOUNT_TYPE */
> ON 1=1
> WHERE ((CAS__Z0.GROUP_CUSTOMER_ID = 80061)
> AND ((CAS__Z0.ACCOUNT_TYPE = CSS__Z1.ACCOUNT_TYPE)
> AND ((CAS__Z0.AS_OF_DATE >= DATE '2018-12-31')
> AND (CAS__Z0.AS_OF_DATE <= DATE 

Cache Join

2019-03-27 Thread Skollur
Hello

I am using apache ignite 2.7 version and trying to join three tables and My
query looks as below. All tables are configured as PARTIONED and also set
"distributedjoin=true" 
==
SELECT 
CAS.Group_Customer_ID as groupCustomerId,
CAS.Account_Number as accountNumber,
'Custody Account' as productTitle,
CAS.Investment_Model as investmentModel,
sum(CAS.Market_Value) AS marketValue,
CSS.Category_Sequence as sequenceNumber,
sum(ASM.Prev_Month_Bal)  as preMonthBal,
sum(ASM.Prev_Year_Bal) as preYearBal  
 FROM
"VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS CAS 
INNER JOIN "IdbCpsSequenceCache".IDB_CPS_SEQUENCE CSS ON 
CAS.Account_Type = CSS.Account_Type  
AND CAS.As_Of_Date BETWEEN '2018-12-31' AND '2018-12-31' 
AND CAS.Group_Customer_ID = 61 
 INNER JOIN "InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY ASM ON
CAS.Group_Customer_ID = ASM.Group_Customer_ID  
AND CAS.Account_Number = ASM.Account_Number   
AND CAS.As_of_Date = ASM.Effective_Date 
AND CAS.Account_Type = ASM.Account_Type  
AND CAS.Product = ASM.Product 
AND CAS.Asset_Class_Allocation = ASM.Asset_Class
group by  
CAS.Group_Customer_ID,
CAS.Account_Number,
CAS.Investment_Model,
CSS.Category_Sequence
=
I have added index for each fields which are used in each cache table as
below. Example in InvestmentSummaryMonthlyCache:-

   ArrayList indexes = new ArrayList<>();
QueryIndex index = new QueryIndex();
index.setName("NonClustered_Index_Summary_Monthly");
index.setIndexType(QueryIndexType.SORTED);
LinkedHashMap indFlds = new LinkedHashMap<>();
indFlds.put("dwId", true);
indFlds.put("groupCustomerId", true);
indFlds.put("accountNumber", true);
indFlds.put("accountType", true);
indFlds.put("product", true);
indFlds.put("productType", true);
indFlds.put("subProduct", true);
indFlds.put("assetClass", true);
indFlds.put("effectiveDate", true);
index.setFields(indFlds);
indexes.add(index);
qryEntity.setIndexes(indexes);
qryEntities.add(qryEntity);
=
When I tried to execute query(I have all fields defined in index as above),
getting an error -> SQL Error [5]: javax.cache.CacheException: Failed to
prepare distributed join query: join condition does not use index
[joinedCache=InvestmentSummaryMonthlyCache, What am I missing here?


plan=SELECT
CAS__Z0.GROUP_CUSTOMER_ID AS __C0_0,
CAS__Z0.ACCOUNT_NUMBER AS __C0_1,
'Custody Account' AS __C0_2,
CAS__Z0.INVESTMENT_MODEL AS __C0_3,
SUM(CAS__Z0.MARKET_VALUE) AS __C0_4,
CSS__Z1.CATEGORY_SEQUENCE AS __C0_5,
SUM(ASM__Z2.PREV_MONTH_BAL) AS __C0_6,
SUM(ASM__Z2.PREV_YEAR_BAL) AS __C0_7
FROM "VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS
CAS__Z0
/*
"VwCustodyAccountPositionCpsCache".VW_CUSTODY_ACCOUNT_POSITION_CPS.__SCAN_
*/
/* WHERE (CAS__Z0.AS_OF_DATE <= DATE '2018-12-31')
AND ((CAS__Z0.GROUP_CUSTOMER_ID = 80061)
AND (CAS__Z0.AS_OF_DATE >= DATE '2018-12-31'))
*/
INNER JOIN "InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY ASM__Z2
/* batched:broadcast
"InvestmentSummaryMonthlyCache".ASSET_SUMMARY_MONTHLY.__SCAN_ */
ON 1=1
/* WHERE (CAS__Z0.ACCOUNT_NUMBER = ASM__Z2.ACCOUNT_NUMBER)
AND ((CAS__Z0.GROUP_CUSTOMER_ID = ASM__Z2.GROUP_CUSTOMER_ID)
AND ((CAS__Z0.AS_OF_DATE = ASM__Z2.EFFECTIVE_DATE)
AND ((CAS__Z0.ACCOUNT_TYPE = ASM__Z2.ACCOUNT_TYPE)
AND ((CAS__Z0.ASSET_CLASS_ALLOCATION = ASM__Z2.ASSET_CLASS)
AND (CAS__Z0.PRODUCT = ASM__Z2.PRODUCT)
*/
INNER JOIN "IdbCpsSequenceCache".IDB_CPS_SEQUENCE CSS__Z1
/* batched:broadcast "IdbCpsSequenceCache".NONCLUSTERED_INDEX_LC:
ACCOUNT_TYPE = CAS__Z0.ACCOUNT_TYPE */
ON 1=1
WHERE ((CAS__Z0.GROUP_CUSTOMER_ID = 80061)
AND ((CAS__Z0.ACCOUNT_TYPE = CSS__Z1.ACCOUNT_TYPE)
AND ((CAS__Z0.AS_OF_DATE >= DATE '2018-12-31')
AND (CAS__Z0.AS_OF_DATE <= DATE '2018-12-31'
AND ((CAS__Z0.ASSET_CLASS_ALLOCATION = ASM__Z2.ASSET_CLASS)
AND ((CAS__Z0.PRODUCT = ASM__Z2.PRODUCT)
AND ((CAS__Z0.ACCOUNT_TYPE = ASM__Z2.ACCOUNT_TYPE)
AND ((CAS__Z0.AS_OF_DATE = ASM__Z2.EFFECTIVE_DATE)
AND ((CAS__Z0.GROUP_CUSTOMER_ID = ASM__Z2.GROUP_CUSTOMER_ID)
AND (CAS__Z0.ACCOUNT_NUMBER = ASM__Z2.ACCOUNT_NUMBER))
GROUP BY CAS__Z0.GROUP_CUSTOMER_ID, CAS__Z0.ACCOUNT_NUMBER,
CAS__Z0.INVESTMENT_MODEL, CSS__Z1.CATEGORY_SEQUENCE]
  SQL Error [5]: javax.cache.CacheException: Failed to prepare
distributed join query: join condition does not use index
[joinedCache=InvestmentSummaryMonthlyCache, plan=SELECT
CAS__Z0.GROUP_CUSTOMER_ID AS __C0_0,
CAS__Z0.ACCOUNT_NUMBER AS __C0_1,
   

Re: Poor cross cache join SQL performance (v2.0.0)

2017-07-12 Thread afedotov
Hi,

As well as for regular DB, each group index relates only to one
entity/table, so there is no such thing like cross cache group index.

Kind regards,
Alex.

On Wed, Jul 12, 2017 at 3:22 PM, iostream [via Apache Ignite Users] <
ml+s70518n14719...@n6.nabble.com> wrote:

> Hi,
>
> Apologies for the delayed response.
>
> Implementing appropriate group indexes solved the problem for me.
>
> One question - can I create a group index across caches? Or group index
> can be created only within one cache?
>
> Thanks! :)
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-
> performance-v2-0-0-tp14373p14719.html
> To start a new topic under Apache Ignite Users, email
> ml+s70518n1...@n6.nabble.com
> To unsubscribe from Apache Ignite Users, click here
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=YWxleGFuZGVyLmZlZG90b2ZmQGdtYWlsLmNvbXwxfC0xMzYxNTU0NTg=>
> .
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-performance-v2-0-0-tp14373p14721.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Poor cross cache join SQL performance (v2.0.0)

2017-07-12 Thread iostream
Hi,

Apologies for the delayed response.

Implementing appropriate group indexes solved the problem for me.

One question - can I create a group index across caches? Or group index can
be created only within one cache?

Thanks! :)



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-performance-v2-0-0-tp14373p14719.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Poor cross cache join SQL performance (v2.0.0)

2017-07-07 Thread Alexander Fedotov
Hi,

It would be great to take a look on a corresponding JFR.

Kind regards,
Alex


6 июля 2017 г. 11:55 AM пользователь "afedotov" <
alexander.fedot...@gmail.com> написал:

Could you please share the reproducer?

Kind regards,
Alex

6 июля 2017 г. 11:18 AM пользователь "iostream [via Apache Ignite
Users]" <[hidden
email] <http:///user/SendEmail.jtp?type=node=14384=0>> написал:

> Hi Alex,
>
> Yes Value2.class was already added to indextypes.
>
> I am only reading the value which is why copyOnRead is set to "false".
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-
> cache-join-SQL-performance-v2-0-0-tp14373p14382.html
> To start a new topic under Apache Ignite Users, email [hidden email]
> <http:///user/SendEmail.jtp?type=node=14384=1>
> To unsubscribe from Apache Ignite Users, click here.
> NAML
> <http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>

--
View this message in context: Re: Poor cross cache join SQL performance
(v2.0.0)
<http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-performance-v2-0-0-tp14373p14384.html>
Sent from the Apache Ignite Users mailing list archive
<http://apache-ignite-users.70518.x6.nabble.com/> at Nabble.com.


Re: Optimum CacheConfiguration for cross cache join SQL queries

2017-07-06 Thread vkulichenko
iostream wrote
> Hi,
> 
> I have 2 caches, each with POJO values that look like -
> 
> public class Value1 {
> 
>   @QuerySqlField(index = true)
>   private Long a;
> 
>   @QuerySqlField(index = true)
>   private String b;
> 
> }
> 
> public class Value2 {
> 
>   @QuerySqlField(index = true)
>   private Long a;
> 
>   @QuerySqlField(index = true)
>   private String c;
> 
> }
> 
> I have to run the following SQL query on my Ignite Cluster -
> 
> SELECT * from Value1 v1, Value2 v2 WHERE v1.a = v2.a
> 
> To test the query performance, I configured my Ignite Cluster to have only
> 1 server node and used 4 clients nodes to run the query. Added 50,000
> Value1 objects and around 800,000 Value2 objects in the cache. I used a
> single node cluster in order to avoid using affinity collocation problems
> when performing cross cache joins. The average response time of the query
> was around 150 seconds! :(
> 
> Server node - 4 CPUs, 8 GB Heap
> Client node - 4 CPUs / client, 1 GB Heap / client
> 
> I am quite surprised to see the SQL perform poorly. I am posting my
> cacheConfiguration for your reference (same for both the caches).
> 
> cacheConfig.setAtomicityMode("transactional");
> cacheConfig.setCacheMode("partitioned");
> cacheConfig.setBackups(1);
> cacheConfig.setCopyOnRead(false);
> cacheConfig.setIndexedTypes(Long.class, Value1.class);
> 
> I am wondering if the query performed poorly because of my cache
> configuration or is Ignite not very efficient with cross cache joins
> (given that I used only 1 node to avoid non-collcation)
> 
> Any help is appreciated!
> 
> Thank you.

First of all, your configuration doesn't look valid. Atomicity mode and
cache mode are enums not strings, code you provided will not compile.

As for performance, this query doesn't have any condition, so it scans at
least one of the tables. For such queries you should apply proper
collocation between two tables and scale it out by adding more nodes and/or
setting CacheConfiguration#queryParallelism to value more than 1. This will
parallelize execution and make it faster.

In addition I would check the execution and verify that it scans the smaller
table. If not, you can set SqlFieldsQuery#enforceJoinOrder to true and
rewrite the query to use explicit joins in preferred order.

And finally, my recommendation would be to pick a query that is closer to to
realistic use case. Queries like provided here will never be very fast as
they imply a lot of scanning. Scaling out is the only way to effectively
increase performance for them.

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Optimum-CacheConfiguration-for-cross-cache-join-SQL-queries-tp14371p14429.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Optimum CacheConfiguration for cross cache join SQL queries

2017-07-06 Thread vkulichenko
Hi,

Please properly subscribe to the mailing list so that the community can
receive email notifications for your messages. To subscribe, send empty
email to user-subscr...@ignite.apache.org and follow simple instructions in
the reply.





--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Optimum-CacheConfiguration-for-cross-cache-join-SQL-queries-tp14371p14428.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Poor cross cache join SQL performance (v2.0.0)

2017-07-06 Thread iostream
Hi Alex,

Yes Value2.class was already added to indextypes.

I am only reading the value which is why copyOnRead is set to "false".



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-performance-v2-0-0-tp14373p14382.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Poor cross cache join SQL performance (v2.0.0)

2017-07-06 Thread afedotov
Hi,

Add Value2 to the list of the indexed classes (setIndexedTypes).
What is the reason for setting copyOnRead(false)?

If the problem still stands after the fix, please check GC or take a JFR.

Kind regards,
Alex

6 июля 2017 г. 7:29 AM пользователь "iostream [via Apache Ignite Users]" <
ml+s70518n14373...@n6.nabble.com> написал:

Hi,

I have 2 caches, each with POJO values that look like -

public class Value1 {

@QuerySqlField(index = true)
private Long a;

@QuerySqlField(index = true)
private String b;

}

public class Value2 {

@QuerySqlField(index = true)
private Long a;

@QuerySqlField(index = true)
private String c;

}

I have to run the following SQL query on my Ignite Cluster -

SELECT * from Value1 v1, Value2 v2 WHERE v1.a = v2.a

To test the query performance, I configured my Ignite Cluster to have only
1 server node and used 4 clients nodes to run the query. Added 50,000
Value1 objects and around 800,000 Value2 objects in the cache. I used a
single node cluster in order to avoid using affinity collocation problems
when performing cross cache joins. The average response time of the query
was around 150 seconds! :(

Server node - 4 CPUs, 8 GB Heap
Client node - 4 CPUs / client, 1 GB Heap / client

I am quite surprised to see the SQL perform poorly. I am posting my
cacheConfiguration for your reference (same for both the caches).

cacheConfig.setAtomicityMode("transactional");
cacheConfig.setCacheMode("partitioned");
cacheConfig.setBackups(1);
cacheConfig.setCopyOnRead(false);
cacheConfig.setIndexedTypes(Long.class, Value1.class);

I am wondering if the query performed poorly because of my cache
configuration or is Ignite not very efficient with cross cache joins (given
that I used only 1 node to avoid non-collcation)

Any help is appreciated!

Thank you.

--
If you reply to this email, your message will be added to the discussion
below:
http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-
performance-v2-0-0-tp14373.html
To start a new topic under Apache Ignite Users, email
ml+s70518n1...@n6.nabble.com
To unsubscribe from Apache Ignite Users, click here
<http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=YWxleGFuZGVyLmZlZG90b2ZmQGdtYWlsLmNvbXwxfC0xMzYxNTU0NTg=>
.
NAML
<http://apache-ignite-users.70518.x6.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Poor-cross-cache-join-SQL-performance-v2-0-0-tp14373p14379.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.