[jira] [Updated] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error
[ https://issues.apache.org/jira/browse/PHOENIX-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhongyuhai updated PHOENIX-5035: Attachment: (was: patch) > phoenix-spark dataframe filtes date or timestamp type with error > > > Key: PHOENIX-5035 > URL: https://issues.apache.org/jira/browse/PHOENIX-5035 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0, 4.14.0, 4.13.1, 5.0.0, 4.14.1 > Environment: HBase:apache 1.2 > Phoenix:4.13.1-HBase-1.2 > Hadoop:CDH 2.6 > Spark:2.3.1 >Reporter: zhongyuhai >Priority: Critical > Labels: patch, pull-request-available > Attachments: PHOENIX-5035.patch, table desc.png > > Original Estimate: 0h > Remaining Estimate: 0h > > *table desc as following:* > attach "table desc.png" > > *code as following:* > val df = SparkUtil.getActiveSession().read.format( > "org.apache.phoenix.spark").options(options).load() > df.filter("INCREATEDDATE = date'2018-07-14'") > > *exception as following:* > java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: > ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) > > *analyse as following:* > In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any > , > > > {code:java} > private def compileValue(value: Any): Any = { > value match { > case stringValue: String => s"'${escapeStringConstant(stringValue)}'" > // Borrowed from 'elasticsearch-hadoop', support these internal UTF types > across Spark versions > // Spark 1.4 > case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > // Spark 1.5 > case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > > // Pass through anything else > case _ => value > } > {code} > > It only handles the String type , other type returns the toString。It makes > the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to > Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not > run with this syntax and throw the exception ERROR 203 (22005): Type > mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。 > *soluation as following:* > add handle to other type just like Date 、Timestamp > {code:java} > private def compileValue(value: Any): Any = { > value match { > case stringValue: String => s"'${escapeStringConstant(stringValue)}'" > // Borrowed from 'elasticsearch-hadoop', support these internal UTF types > across Spark versions > // Spark 1.4 > case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > // Spark 1.5 > case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => > s"'${escapeStringConstant(utf.toString)}'" > case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => { > val config: Configuration = > HBaseFactoryProvider.getConfigurationFactory.getConfiguration > val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, > DateUtil.DEFAULT_DATE_FORMAT) > val df = new SimpleDateFormat(dateFormat) > s"date'${df.format(d)}'" > } > case dt if(isClass(dt , "java.sql.Timestamp")) => { > val config: Configuration = > HBaseFactoryProvider.getConfigurationFactory.getConfiguration > val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, > DateUtil.DEFAULT_TIMESTAMP_FORMAT) > val df = new SimpleDateFormat(dateTimeFormat) > s"timestamp'${df.format(dt)}'" > } > // Pass through anything else > case _ => value > } > } > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5035) phoenix-spark dataframe filtes date or timestamp type with error
zhongyuhai created PHOENIX-5035: --- Summary: phoenix-spark dataframe filtes date or timestamp type with error Key: PHOENIX-5035 URL: https://issues.apache.org/jira/browse/PHOENIX-5035 Project: Phoenix Issue Type: Bug Affects Versions: 4.14.1, 5.0.0, 4.13.1, 4.14.0, 4.13.0 Environment: HBase:apache 1.2 Phoenix:4.13.1-HBase-1.2 Hadoop:CDH 2.6 Spark:2.3.1 Reporter: zhongyuhai Attachments: table desc.png *table desc as following:* attach "table desc.png" *code as following:* val df = SparkUtil.getActiveSession().read.format( "org.apache.phoenix.spark").options(options).load() df.filter("INCREATEDDATE = date'2018-07-14'") *exception as following:* java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:201) at org.apache.phoenix.mapreduce.PhoenixInputFormat.getSplits(PhoenixInputFormat.java:87) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:127) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) *analyse as following:* In the org.apache.phoenix.spark.PhoenixRelation.compileValue(value: Any): Any , {code:java} private def compileValue(value: Any): Any = { value match { case stringValue: String => s"'${escapeStringConstant(stringValue)}'" // Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions // Spark 1.4 case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'" // Spark 1.5 case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'" // Pass through anything else case _ => value } {code} It only handles the String type , other type returns the toString。It makes the Spark filte condition "INCREATEDDATE = date'2018-07-14'" translate to Phoenix filte condition like "INCREATEDDATE = 2018-07-14" ,so Phoenix can not run with this syntax and throw the exception ERROR 203 (22005): Type mismatch. DATE and BIGINT for "INCREATEDDATE" = 1997 。 *soluation as following:* add handle to other type just like Date 、Timestamp {code:java} private def compileValue(value: Any): Any = { value match { case stringValue: String => s"'${escapeStringConstant(stringValue)}'" // Borrowed from 'elasticsearch-hadoop', support these internal UTF types across Spark versions // Spark 1.4 case utf if (isClass(utf, "org.apache.spark.sql.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'" // Spark 1.5 case utf if (isClass(utf, "org.apache.spark.unsafe.types.UTF8String")) => s"'${escapeStringConstant(utf.toString)}'" case d if(isClass(d , "java.lang.Date") || isClass(d , "java.sql.Date")) => { val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration val dateFormat = config.get(QueryServices.DATE_FORMAT_ATTRIB, DateUtil.DEFAULT_DATE_FORMAT) val df = new SimpleDateFormat(dateFormat) s"date'${df.format(d)}'" } case dt if(isClass(dt , "java.sql.Timestamp")) => { val config: Configuration = HBaseFactoryProvider.getConfigurationFactory.getConfiguration val dateTimeFormat = config.get(QueryServices.TIMESTAMP_FORMAT_ATTRIB, DateUtil.DEFAULT_TIMESTAMP_FORMAT) val df = new SimpleDateFormat(dateTimeFormat) s"timestamp'${df.format(dt)}'" } // Pass through anything else case _ => value } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5034) Log all critical statements in SYSTEM.LOG table.
Xu Cang created PHOENIX-5034: Summary: Log all critical statements in SYSTEM.LOG table. Key: PHOENIX-5034 URL: https://issues.apache.org/jira/browse/PHOENIX-5034 Project: Phoenix Issue Type: Improvement Reporter: Xu Cang Assignee: Xu Cang In production, sometimes engineers see table got dropped unexpectedly. It's not easy to SCAN raw table from HBase itself to understand what happened and when the table get dropped. Since we already have SYSTEM.LOG query log facility in Phoenix that sampling query statement (log 1% statement by default). It's good to always log critical statements such as "DROP" or "ALTER" statements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4757) composite key salt_buckets
[ https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerald Sangudi updated PHOENIX-4757: Comment: was deleted (was: [~tdsilva] - id_1and id_2 are not constant. They are meaningful in the application domain, and they vary by record. There would be no write hotspotting if new writes arrive in random order wrt id_2.) > composite key salt_buckets > -- > > Key: PHOENIX-4757 > URL: https://issues.apache.org/jira/browse/PHOENIX-4757 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.11.0 >Reporter: cmd >Priority: Major > Fix For: 4.11.0 > > > CREATE TABLE IF NOT EXISTS user_events ( > user_id VARCHAR NOT NULL, > event_type VARCHAR NOT NULL, > event_time VARCHAR NOT NULL > event_msg VARCHAR NOT NULL > event_status VARCHAR NOT NULL > event_opt VARCHAR NOT NULL > CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) > SALT_BUCKETS=128; > and my query is: > 1.select event_type,count(0) from us_population where user_id='' group > by event_type > 2.select count(0) from us_population where user_id='' and > event_type='0101' > 3.select * from us_population where user_id='' and event_type='0101' and > event_time>'20180101' and event_time<'20180201' order by event_time limit > 50,100 > Concurrency query ratio: > 1:80% > 2:10% > 3:10% > user_events data :50billion > It can be a field/some fileds of the primary key salted by hash > grammar with "SALT_BUCKETS(user_id)=4" or > "SALT_BUCKETS(user_id,event_type)=4" > ref: > > [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PHOENIX-4764) Cleanup metadata of child views for a base table that has been dropped
[ https://issues.apache.org/jira/browse/PHOENIX-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva resolved PHOENIX-4764. - Resolution: Fixed > Cleanup metadata of child views for a base table that has been dropped > -- > > Key: PHOENIX-4764 > URL: https://issues.apache.org/jira/browse/PHOENIX-4764 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Kadir OZDEMIR >Priority: Blocker > Fix For: 4.15.0, 5.1.0 > > Attachments: > 0001-PHOENIX-4764-Revised-fix-based-on-the-review-comment.patch, > PHOENIX-4764.4.x-HBase-1.4.0001.patch, PHOENIX-4764.4.x-HBase-1.4.0002.patch, > PHOENIX-4764.4.x-HBase-1.4.0003.patch, PHOENIX-4764.4.x-HBase-1.4.0004.patch, > PHOENIX-4764.4.x-HBase-1.4.0005.patch, PHOENIX-4764.master.0001.patch, > PHOENIX-4764.master.0002.patch, PHOENIX-4764.master.0003.patch, > PHOENIX-4764.master.0003.patch, PHOENIX-4764.master.0004.patch, > PHOENIX-4764.master.0005.patch, PHOENIX-4764.master.0006.patch > > > Add a new SYSTEM.TASK table that is used to keep track of tables which have > been dropped but whose child view metadata hasn' been deleted yet. > Add a scheduled task that queries this table periodically and drops the child > view metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4983) Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes or tables with a ROW_TIMESTAMP column
[ https://issues.apache.org/jira/browse/PHOENIX-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-4983: Description: Currently If a SCN is set on a connection it is read-only. We only need to prevent a client from using a connection with a SCN set to upsert data for: 1) transactional tables 2) mutable tables with indexes 3) tables with a ROW_TIMESTAMP column was:Currently If a SCN set on a connection it is read-only. We only need to prevent a client from setting the timestamp for transactional tables or mutable tables with global and local indexes. > Allow using a connection with a SCN set to write data to tables EXCEPT > transactional tables or mutable tables with indexes or tables with a > ROW_TIMESTAMP column > > > Key: PHOENIX-4983 > URL: https://issues.apache.org/jira/browse/PHOENIX-4983 > Project: Phoenix > Issue Type: New Feature >Reporter: Thomas D'Silva >Assignee: Swaroopa Kadam >Priority: Major > Labels: SFDC > > Currently If a SCN is set on a connection it is read-only. We only need to > prevent a client from using a connection with a SCN set to upsert data for: > 1) transactional tables > 2) mutable tables with indexes > 3) tables with a ROW_TIMESTAMP column -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4983) Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes or tables with a ROW_TIMESTAMP column
[ https://issues.apache.org/jira/browse/PHOENIX-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-4983: Summary: Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes or tables with a ROW_TIMESTAMP column (was: Allow using a connection with a SCN set to write data to tables EXCEPT transactional tables or mutable tables with indexes) > Allow using a connection with a SCN set to write data to tables EXCEPT > transactional tables or mutable tables with indexes or tables with a > ROW_TIMESTAMP column > > > Key: PHOENIX-4983 > URL: https://issues.apache.org/jira/browse/PHOENIX-4983 > Project: Phoenix > Issue Type: New Feature >Reporter: Thomas D'Silva >Assignee: Swaroopa Kadam >Priority: Major > Labels: SFDC > > Currently If a SCN set on a connection it is read-only. We only need to > prevent a client from setting the timestamp for transactional tables or > mutable tables with global and local indexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5033) connect() method in PhoenixDriver should catch exception properly
Xu Cang created PHOENIX-5033: Summary: connect() method in PhoenixDriver should catch exception properly Key: PHOENIX-5033 URL: https://issues.apache.org/jira/browse/PHOENIX-5033 Project: Phoenix Issue Type: Bug Affects Versions: 4.13.0 Reporter: Xu Cang See this error in production: Problem executing query. *Stack trace: java.lang.IllegalMonitorStateException: attempt to unlock read lock, not locked by current thread* at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.unmatchedUnlockException(ReentrantReadWriteLock.java:444) at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:428) at java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1341) at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:881) at org.apache.phoenix.jdbc.PhoenixDriver.unlock(PhoenixDriver.java:346) at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:223) at phoenix.connection.ProtectedPhoenixConnectionFactory$PhoenixConnectionFactory.createPhoenixConnection(ProtectedPhoenixConnectionFactory.java:233) at phoenix.connection.ProtectedPhoenixConnectionFactory.create(ProtectedPhoenixConnectionFactory.java:95) at phoenix.util.PhoenixConnectionUtil.getConnection(PhoenixConnectionUtil.java:59) at phoenix.util.PhoenixConnectionUtil.getConnection(PhoenixConnectionUtil.java:48) at pliny.db.PhoenixConnectionProviderImpl$ConnectionType$1.getConnection(PhoenixConnectionProviderImpl.java:158) at pliny.db.PhoenixConnectionProviderImpl.getGenericConnection(PhoenixConnectionProviderImpl.java:67) at communities.util.db.phoenix.ManagedPhoenixConnection.createManagedGenericConnection(ManagedPhoenixConnection.java:73) at communities.util.db.phoenix.ManagedPhoenixConnection.getGenericConnectionForAsyncOperation(ManagedPhoenixConnection.java:51) at communities.util.db.phoenix.AbstractAsyncPhoenixRequest.call(AbstractAsyncPhoenixRequest.java:183) at core.chatter.feeds.read.FeedEntityReadByUserPhoenixQuery.call(FeedEntityReadByUserPhoenixQuery.java:66) at communities.util.db.phoenix.AbstractAsyncPhoenixRequest.call(AbstractAsyncPhoenixRequest.java:1) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Questionable code: # @Override # public Connection connect(String url, Properties info) throws SQLException { # if (!acceptsURL(url)) { # return null; # } # try { # lockInterruptibly(LockMode.READ); # checkClosed(); # return createConnection(url, info); # } finally { # unlock(LockMode.READ); # } # } # # -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4765) Add client and server side config property to enable rollback of splittable System Catalog if required
[ https://issues.apache.org/jira/browse/PHOENIX-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-4765: Attachment: PHOENIX-4765-v1.patch > Add client and server side config property to enable rollback of splittable > System Catalog if required > -- > > Key: PHOENIX-4765 > URL: https://issues.apache.org/jira/browse/PHOENIX-4765 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva >Priority: Blocker > Attachments: PHOENIX-4765-v1.patch > > > After the server has been upgraded we will have a client and server side > config property that will allow us to rollback the upgrade if required. This > config will: > 1. Continue storing parent column metadata along with a child view > 2. Disallow metadata changes to a base table that require being propagated to > child views. > 3. Prevent SYSTEM.CATALOG from splitting. > If the client is older than the server we also disallow metadata changes to a > base table with child views since we no longer lock the parent on the server > side. This is handled on the client side as part of PHOENIX-3534. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5032) add Apache Yetus to Phoenix
Artem Ervits created PHOENIX-5032: - Summary: add Apache Yetus to Phoenix Key: PHOENIX-5032 URL: https://issues.apache.org/jira/browse/PHOENIX-5032 Project: Phoenix Issue Type: Task Reporter: Artem Ervits Assignee: Artem Ervits Spoke with [~elserj], will benefit greatly from Yetus. -- This message was sent by Atlassian JIRA (v7.6.3#76005)