One comment about your specific query.    You have a "left outer join" against 
the patientaddress table.  This can actually be written as an "inner join" with 
the same results since you are restricting the right side of the outer join 
with the where clause.   You can try your query as an inner join and see if 
that improves your performance.    The relational database engine probably 
optimized the query by re-writing it as an inner join.    If you did not have 
the "a.State='LA'", then it would have to remain an outer join.   The left 
outer join might be greatly effecting performance.    Since you are also using 
group by and order by, the total amount of results might also effect the 
performance.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
      from claim c inner join patient p on c.patientid = p.patientid
      left outer join patientaddress a on p.patientid = a.patientid
      where a.[State] = 'LA'
      group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

Note:  You can see a reference in mysql as an example about when outer joins 
can be re-written to inner joins:  
https://dev.mysql.com/doc/refman/5.0/en/outer-join-simplification.html

Brad Rix | Technical Lead
O: +1 303 542 2172 | M: +1 303 915 2771
Skype: Brad.Rix | Google Talk: 
bradford....@gmail.com<mailto:bradford....@gmail.com>

From: 
general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Sara Mazer
Sent: Monday, April 27, 2015 6:30 PM
To: MarkLogic Developer Discussion; ji...@rdacorp.com<mailto:ji...@rdacorp.com>
Cc: Scott Fowler; Walt Rolle
Subject: Re: [MarkLogic Dev General] SQL Query Performance and Tableau 
Connection

Hi Cynthia,
My name is Sara Mazer, and I am a solutions consultant in the DC area.  I've 
worked a lot with Tableau and MarkLogic and would like to set up some time to 
discuss your evaluation and give you some best practices so that you can get 
the best performance possible.  I was the person who worked with Tableau to add 
MarkLogic as a named connection and supported their certification of MarkLogic, 
and I think I can help you.

I have a too many best practices to go over by email, so I'd like to speak with 
you and walk you through setting up databases for successful analytics through 
WebEx.  Would that be possible?  I have a meeting from 9:30-11:00 but can move 
any other meeting I have tomorrow to suit your schedule, just let me know a 
good time for you.

In general, I plan on:
Recommending the best data model for MarkLogic of this type of data (hint: it 
doesn't look like what would be in an RDBMs)
Showing you a demo of patient and claims data with Tableau and review how it 
was done
Discussing the best practices of working with Tableau (minimize the data coming 
back from MarkLogic by using custom SQL for charts, using MATCH keyword, having 
Tableau do grouping/sorting instead of in SQL)
Reviewing common connection issues between MarkLogic and Tableau and how to 
avoid them (most likely you are corrupting your SQL views:a best practice is to 
have a unique schema database for each MarkLogic database)
Reviewing performance of XQuery/JavaScript vs. SQL in MarkLogic

Best regards,
Sara Mazer
sara dot mazer "at" marklogic dot com
This e-mail and any accompanying attachments are confidential. The information 
is intended solely for the use of the individual to whom it is addressed. Any 
review, disclosure, copying, distribution, or use of this e-mail communication 
by others is strictly prohibited. If you are not the intended recipient, please 
notify us immediately by returning this message to the sender and delete all 
copies. Thank you for your cooperation.



From: 
general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of Cynthia Jiang
Sent: Monday, April 27, 2015 4:56 PM
To: General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Cc: Walt Rolle
Subject: [MarkLogic Dev General] SQL Query Performance and Tableau Connection

Hello,

We have been testing MarkLogic 8 downloaded from your website for a POC.

Our environment is very straight forward.

A single VM (windows 7 Enterprise - 64bit) with 16GM RAM and 2 Intel (E5-2690 
v2) 3GHz processors.

We have around 4 SQL tables that we modeled in Json format so each row ends up 
becoming a document, and here are some of the stats:

Patient : 8 fields / 20,924 documents
PatientAddress: 5 fields / 36,193 documents
Claim: less than 30 fields / 335,349 documents
ClaimDiagnosis: 2 fields / 8,676 documents

We created element range index for all the fields, and created 4 views that 
each represent a SQL table.

Then we were able to use SQL to query the data and get the result in the query 
console.

But the performance is not nearly as what we expected, we thought it would work 
faster than SQL environment.

A simple query like the one below took 30 seconds to run, and the same query 
took 3 seconds to run with a larger data set in a SQL environment with 6GB 
memory and 4 processors.

select c.Patientid, ClaimNumber, FirstName, LastName, sum (PharmacyTotalCost) 
as PharmacyTotalCost
from claim c inner join patient p on c.patientid = p.patientid  left outer join 
patientaddress a on p.patientid = a.patientid
where a.[State] = 'LA'
group by  c.PatientId, ClaimNumber, FirstName, LastName  order by 
SUM(PharmacyTotalCost) desc limit 10

We also tested out the connection between MarkLogic and Tableau 9 trial version 
through MarkLogic 64 bit ODBC connector on the same VM.
The connection is not stable, got lost very frequently, and sometimes it won't 
stop running for 20 minutes after adding one dimension and one simple measure 
the columns and rows shelves.

We are under a very tight timeline and have to report our findings back to our 
client within 24 hours. With the current performance, we will not be able to 
recommend any MarkLogic product to our client.

Please help us if there is anything that we could do to improve the SQL query 
performance and data connectivity between Tableau and MarkLogic.

Also, for the query as the simple example above, is there any XQuery or 
Javascript Query that we can use that could produce the similar results? We 
cannot find any documented examples.

Any help will be really appreciated!

Thank you very much,

Cynthia Jiang
RDA Corpration


_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to