[jira] [Updated] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

2018-11-20 Thread zhongyuhai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhongyuhai updated PHOENIX-5035:

Attachment: (was: patch)

> phoenix-spark dataframe filtes date or timestamp type with error
> 
>
> Key: PHOENIX-5035
> URL: https://issues.apache.org/jira/browse/PHOENIX-5035
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1
> Environment: HBase:apache 1.2
> Phoenix:4.13.1-HBase-1.2
> Hadoop:CDH 2.6
> Spark:2.3.1
>Reporter: zhongyuhai
>Priority: Critical
>  Labels: patch, pull-request-available
> Attachments: PHOENIX-5035.patch, table desc.png
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> *table desc as following:*
> attach "table desc.png"
>  
> *code as following:*
> val df = SparkUtil.getActiveSession().read.format( 
> "org.apache.phoenix.spark").options(options).load()
> df.filter("INCREATEDDATE = date'2018-07-14'")
>  
> *exception as following:*
> java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
> ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
>  at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
>  at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
>  
> *analyse as following:*
> In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any 
> ,
>  
>  
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
>  
> // Pass through anything else
> case _ => value
> }
> {code}
>  
> It only handles the String type , other type returns the toString。It makes 
> the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to 
> Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not 
> run with this syntax and throw the exception ERROR 203 (22005): Type 
> mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。
> *soluation as following:*
> add handle to other type just like Date 、Timestamp 
> {code:java}
> private def compileValue(value: Any): Any = {
> value match {
> case stringValue: String => s"'${escapeStringConstant(stringValue)}'"
> // Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
> across Spark versions
> // Spark 1.4
> case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> // Spark 1.5
> case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
> s"'${escapeStringConstant(utf.toString)}'"
> case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_DATE_FORMAT)
> val df = new SimpleDateFormat(dateFormat)
> s"date'${df.format(d)}'"
> }
> case dt if(isClass(dt , "java.sql.Timestamp")) => {
> val config: Configuration = 
> HBaseFactoryProvider.getConfigurationFactory.getConfiguration
> val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, 
> DateUtil.DEFAULT_TIMESTAMP_FORMAT)
> val df = new SimpleDateFormat(dateTimeFormat)
> s"timestamp'${df.format(dt)}'"
> }
> // Pass through anything else
> case _ => value
> }
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error

2018-11-20 Thread zhongyuhai (JIRA)
zhongyuhai created PHOENIX-5035:
---

 Summary: phoenix-spark dataframe filtes date or timestamp type 
with error
 Key: PHOENIX-5035
 URL: https://issues.apache.org/jira/browse/PHOENIX-5035
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.14.1, 5.0.0, 4.13.1, 4.14.0, 4.13.0
 Environment: HBase:apache 1.2

Phoenix:4.13.1-HBase-1.2

Hadoop:CDH 2.6

Spark:2.3.1
Reporter: zhongyuhai
 Attachments: table desc.png

*table desc as following:*

attach "table desc.png"

 

*code as following:*

val df = SparkUtil.getActiveSession().read.format( 
"org.apache.phoenix.spark").options(options).load()

df.filter("INCREATEDDATE = date'2018-07-14'")

 

*exception as following:*

java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: 
ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997
 at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201)
 at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87)
 at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)

 

*analyse as following:*

In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any ,

 

 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"

 

// Pass through anything else
case _ => value
}
{code}
 

It only handles the String type , other type returns the toString。It makes the 
Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to Phoenix 
filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not run with 
this syntax and throw the exception ERROR 203 (22005): Type mismatch. DATE and 
BIGINT for "INCREATEDDATE" = 1997 。

*soluation as following:*

add handle to other type just like Date 、Timestamp 
{code:java}
private def compileValue(value: Any): Any = {
value match {
case stringValue: String => s"'${escapeStringConstant(stringValue)}'"

// Borrowed from 'elasticsearch-hadoop', support these internal UTF types 
across Spark versions
// Spark 1.4
case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"
// Spark 1.5
case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => 
s"'${escapeStringConstant(utf.toString)}'"

case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => {
val config: Configuration = 
HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, 
DateUtil.DEFAULT_DATE_FORMAT)
val df = new SimpleDateFormat(dateFormat)
s"date'${df.format(d)}'"
}

case dt if(isClass(dt , "java.sql.Timestamp")) => {
val config: Configuration = 
HBaseFactoryProvider.getConfigurationFactory.getConfiguration
val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, 
DateUtil.DEFAULT_TIMESTAMP_FORMAT)
val df = new SimpleDateFormat(dateTimeFormat)
s"timestamp'${df.format(dt)}'"
}

// Pass through anything else
case _ => value
}
}
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5034) Log all critical statements in SYSTEM.LOG table.

2018-11-20 Thread Xu Cang (JIRA)
Xu Cang created PHOENIX-5034:


 Summary: Log all critical statements in SYSTEM.LOG table.
 Key: PHOENIX-5034
 URL: https://issues.apache.org/jira/browse/PHOENIX-5034
 Project: Phoenix
  Issue Type: Improvement
Reporter: Xu Cang
Assignee: Xu Cang


In production, sometimes engineers see table got dropped unexpectedly. It's not 
easy to SCAN raw table from HBase itself to understand what happened and when 
the table get dropped.

Since we already have SYSTEM.LOG query log facility in Phoenix that sampling 
query statement (log 1% statement by default). It's good to always log critical 
statements such as "DROP" or "ALTER" statements.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4757) composite key salt_buckets

2018-11-20 Thread Gerald Sangudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4757:

Comment: was deleted

(was: [~tdsilva] - id_1and id_2 are not constant. They are meaningful in the 
application domain, and they vary by record. There would be no write 
hotspotting if new writes arrive in random order wrt id_2.)

> composite key salt_buckets
> --
>
> Key: PHOENIX-4757
> URL: https://issues.apache.org/jira/browse/PHOENIX-4757
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: cmd
>Priority: Major
> Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='' group 
> by event_type
>  2.select count(0) from us_population where user_id='' and 
> event_type='0101'
>  3.select * from us_population where user_id='' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PHOENIX-4764) Cleanup metadata of child views for a base table that has been dropped

2018-11-20 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-4764.
-
Resolution: Fixed

> Cleanup metadata of child views for a base table that has been dropped
> --
>
> Key: PHOENIX-4764
> URL: https://issues.apache.org/jira/browse/PHOENIX-4764
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Kadir OZDEMIR
>Priority: Blocker
> Fix For: 4.15.0, 5.1.0
>
> Attachments: 
> 0001-PHOENIX-4764-Revised-fix-based-on-the-review-comment.patch, 
> PHOENIX-4764.4.x-HBase-1.4.0001.patch, PHOENIX-4764.4.x-HBase-1.4.0002.patch, 
> PHOENIX-4764.4.x-HBase-1.4.0003.patch, PHOENIX-4764.4.x-HBase-1.4.0004.patch, 
> PHOENIX-4764.4.x-HBase-1.4.0005.patch, PHOENIX-4764.master.0001.patch, 
> PHOENIX-4764.master.0002.patch, PHOENIX-4764.master.0003.patch, 
> PHOENIX-4764.master.0003.patch, PHOENIX-4764.master.0004.patch, 
> PHOENIX-4764.master.0005.patch, PHOENIX-4764.master.0006.patch
>
>
> Add a new SYSTEM.TASK table that is used to keep track of tables which have 
> been dropped but whose child view metadata hasn' been deleted yet. 
> Add a scheduled task that queries this table periodically and drops the child 
> view metadata. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4983) Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes or tables with a ROW_TIMESTAMP column

2018-11-20 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-4983:

Description: 
Currently If a SCN is set on a connection it is read-only. We only need to 
prevent a client from using a connection with a SCN set to upsert data for:
1) transactional tables 
2) mutable tables with indexes 
3) tables with a ROW_TIMESTAMP column

  was:Currently If a SCN set on a connection it is read-only. We only need to 
prevent a client from setting the timestamp for transactional tables or mutable 
tables with global and local indexes.


> Allow using a connection with a SCN set to write data to tables EXCEPT 
> transactional tables or mutable tables with indexes or tables with a 
> ROW_TIMESTAMP column
> 
>
> Key: PHOENIX-4983
> URL: https://issues.apache.org/jira/browse/PHOENIX-4983
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Thomas D'Silva
>Assignee: Swaroopa Kadam
>Priority: Major
>  Labels: SFDC
>
> Currently If a SCN is set on a connection it is read-only. We only need to 
> prevent a client from using a connection with a SCN set to upsert data for:
> 1) transactional tables 
> 2) mutable tables with indexes 
> 3) tables with a ROW_TIMESTAMP column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4983) Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes or tables with a ROW_TIMESTAMP column

2018-11-20 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-4983:

Summary: Allow using a connection with a SCN set to write data to tables 
EXCEPT transactional tables or mutable tables with indexes or tables with a 
ROW_TIMESTAMP column  (was: Allow using a connection with a SCN set to write 
data to tables EXCEPT transactional tables or mutable tables with indexes)

> Allow using a connection with a SCN set to write data to tables EXCEPT 
> transactional tables or mutable tables with indexes or tables with a 
> ROW_TIMESTAMP column
> 
>
> Key: PHOENIX-4983
> URL: https://issues.apache.org/jira/browse/PHOENIX-4983
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Thomas D'Silva
>Assignee: Swaroopa Kadam
>Priority: Major
>  Labels: SFDC
>
> Currently If a SCN set on a connection it is read-only. We only need to 
> prevent a client from setting the timestamp for transactional tables or 
> mutable tables with global and local indexes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5033) connect() method in PhoenixDriver should catch exception properly

2018-11-20 Thread Xu Cang (JIRA)
Xu Cang created PHOENIX-5033:


 Summary: connect() method in PhoenixDriver should catch exception 
properly
 Key: PHOENIX-5033
 URL: https://issues.apache.org/jira/browse/PHOENIX-5033
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.13.0
Reporter: Xu Cang


See this error in production:

 

Problem executing query. *Stack trace: java.lang.IllegalMonitorStateException: 
attempt to unlock read lock, not locked by current thread*

at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.unmatchedUnlockException(ReentrantReadWriteLock.java:444)

at 
java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:428)

at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341)

at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881)

at org.apache.phoenix.jdbc.PhoenixDriver.unlock(PhoenixDriver.java:346)

at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:223)

at 
phoenix.connection.ProtectedPhoenixConnectionFactory$PhoenixConnectionFactory.createPhoenixConnection(ProtectedPhoenixConnectionFactory.java:233)

at 
phoenix.connection.ProtectedPhoenixConnectionFactory.create(ProtectedPhoenixConnectionFactory.java:95)

at 
phoenix.util.PhoenixConnectionUtil.getConnection(PhoenixConnectionUtil.java:59)

at 
phoenix.util.PhoenixConnectionUtil.getConnection(PhoenixConnectionUtil.java:48)

at 
pliny.db.PhoenixConnectionProviderImpl$ConnectionType$1.getConnection(PhoenixConnectionProviderImpl.java:158)

at 
pliny.db.PhoenixConnectionProviderImpl.getGenericConnection(PhoenixConnectionProviderImpl.java:67)

at 
communities.util.db.phoenix.ManagedPhoenixConnection.createManagedGenericConnection(ManagedPhoenixConnection.java:73)

at 
communities.util.db.phoenix.ManagedPhoenixConnection.getGenericConnectionForAsyncOperation(ManagedPhoenixConnection.java:51)

at 
communities.util.db.phoenix.AbstractAsyncPhoenixRequest.call(AbstractAsyncPhoenixRequest.java:183)

at 
core.chatter.feeds.read.FeedEntityReadByUserPhoenixQuery.call(FeedEntityReadByUserPhoenixQuery.java:66)

at 
communities.util.db.phoenix.AbstractAsyncPhoenixRequest.call(AbstractAsyncPhoenixRequest.java:1)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

 

 

Questionable code:

 

 
 # @Override
 # public Connection connect(String url, Properties info) throws SQLException {
 # if (!acceptsURL(url)) {
 # return null;
 # }
 # try {
 #  lockInterruptibly(LockMode.READ);
 #  checkClosed();
 # return createConnection(url, info);
 # } finally {
 #  unlock(LockMode.READ);
 # }
 # }
 #  
 #  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4765) Add client and server side config property to enable rollback of splittable System Catalog if required

2018-11-20 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-4765:

Attachment: PHOENIX-4765-v1.patch

> Add client and server side config property to enable rollback of splittable 
> System Catalog if required
> --
>
> Key: PHOENIX-4765
> URL: https://issues.apache.org/jira/browse/PHOENIX-4765
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
>Priority: Blocker
> Attachments: PHOENIX-4765-v1.patch
>
>
> After the server has been upgraded we will have a client and server side 
> config property that will allow us to rollback the upgrade if required. This 
> config will:
> 1. Continue storing parent column metadata along with a child view 
> 2. Disallow metadata changes to a base table that require being propagated to 
> child views.
> 3. Prevent SYSTEM.CATALOG from splitting.
> If the client is older than the server we also disallow metadata changes to a 
> base table with child views since we no longer lock the parent on the server 
> side. This is handled on the client side as part of PHOENIX-3534.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-5032) add Apache Yetus to Phoenix

2018-11-20 Thread Artem Ervits (JIRA)
Artem Ervits created PHOENIX-5032:
-

 Summary: add Apache Yetus to Phoenix
 Key: PHOENIX-5032
 URL: https://issues.apache.org/jira/browse/PHOENIX-5032
 Project: Phoenix
  Issue Type: Task
Reporter: Artem Ervits
Assignee: Artem Ervits


Spoke with [~elserj], will benefit greatly from Yetus.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)