Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
Hey Simon, I am indeed getting back duplicates of aggregate ids. I reworked my query to using exists like you suggested and now I get back correct result count. Thanks so much for your suggestion and saving my day! I can now move ahead on my work ticket. Many, many, thanks again, appreciate it!! Prerna On Saturday, August 15, 2020 at 3:22:02 PM UTC-4, Simon King wrote: > > This is the same problem: you're writing a query that joins 3 tables > together, and then applying a "LIMIT 20" to that query. If you look > carefully at your 20 rows of psql output, I expect you'll see the same > aggregates_id appear more than once. There are less than 20 distinct > Aggregate objects. When SQLAlchemy receives those rows, it skips the > duplicates. > > The SQL looks something like this: > > SELECT ... FROM aggregates > LEFT JOIN aggregate_blocks ON ... > LEFT JOIN blocks ON ... > LIMIT 20 > > You need the LIMIT to apply just to the aggregates table, rather than > the joined tables. This is a bit tricky because you want to filter by > a field in one of those joined tables. You could use an EXISTS > function, something like this: > > SELECT * FROM aggregates > WHERE EXISTS ( > SELECT 1 > FROM aggregate_blocks > INNER JOIN blocks ON aggregate_blocks.block_id = block.id > WHERE aggregate_blocks.aggregate_id = aggregates.id > AND blocks.is_complete = false > ) > LIMIT 20 > > If you run that in psql, I think you should get 20 *different* > aggregates rows back. > > If that works, then to turn it into SQLAlchemy ORM syntax, you should > use Query.exists(): > > > https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists > > > Hope that helps, > > Simon > > On Fri, Aug 14, 2020 at 3:56 PM Prerna Pandit > wrote: > > > > Hey Simon, > > > > Thanks so much for replying to my question. I reworked my code to use > sqlalchemy ORM and took off flask and paginate so I can narrow down the > issue. My models now extend from declarative_base. > > > > engine = > create_engine('postgresql://postgres:postgres@localhost:5400/postgres') > > Session = sessionmaker(bind=engine) > > > > Base = declarative_base() > > session = Session() > > > > Models > > > > class Aggregate(Base): > > __tablename__ = 'aggregates' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > site_id = Column(UUID(as_uuid=True), nullable=True) > > created_at = Column(DateTime, default=sa.func.now()) > > created_by = Column(UUID(as_uuid=True), nullable=True) > > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > > updated_by = Column(UUID(as_uuid=True), nullable=True) > > blocks = relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='aggregate') > > > > > > class Block(Base): > > > > __tablename__ = 'blocks' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > type = Column(Text, nullable=False) > > heading = Column(Text, nullable=True) > > subheading = Column(Text, nullable=True) > > label = Column(Text, nullable=True) > > is_complete = Column(Boolean, default=False) > > created_at = Column(DateTime, default=sa.func.now()) > > created_by = Column(UUID(as_uuid=True), nullable=True) > > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > > updated_by = Column(UUID(as_uuid=True), nullable=True) > > aggregates = relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='block') > > > > > > class AggregateBlock(Base): > > __tablename__ = 'aggregate_blocks' > > id = Column(UUID(as_uuid=True), primary_key=True, > >server_default=text('uuid_generate_v4()')) > > block_id = Column(UUID(as_uuid=True), > > ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > > aggregate_id = Column(UUID(as_uuid=True), > > ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > > position = Column(Integer, nullable=False) > > block = relationship('Block', back_populates='aggregates') > > aggregate = relationship('Aggregate', back_populates='blocks') > > > > > > Query: > > > > select = session.query(Aggregate).order_by(Aggregate.created_at) \ > > .join(AggregateBlock) \ > > .join(Block) \ > > .filter(Block.is_complete == complete) \ > > > > all_results = select.all() > > limit_results = select.limit(20).all() > > > > > > I still get inconsistent results when I apply limit. Like select.all() > will return 47 rows but with limit it'll return
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
This is the same problem: you're writing a query that joins 3 tables together, and then applying a "LIMIT 20" to that query. If you look carefully at your 20 rows of psql output, I expect you'll see the same aggregates_id appear more than once. There are less than 20 distinct Aggregate objects. When SQLAlchemy receives those rows, it skips the duplicates. The SQL looks something like this: SELECT ... FROM aggregates LEFT JOIN aggregate_blocks ON ... LEFT JOIN blocks ON ... LIMIT 20 You need the LIMIT to apply just to the aggregates table, rather than the joined tables. This is a bit tricky because you want to filter by a field in one of those joined tables. You could use an EXISTS function, something like this: SELECT * FROM aggregates WHERE EXISTS ( SELECT 1 FROM aggregate_blocks INNER JOIN blocks ON aggregate_blocks.block_id = block.id WHERE aggregate_blocks.aggregate_id = aggregates.id AND blocks.is_complete = false ) LIMIT 20 If you run that in psql, I think you should get 20 *different* aggregates rows back. If that works, then to turn it into SQLAlchemy ORM syntax, you should use Query.exists(): https://docs.sqlalchemy.org/en/13/orm/query.html#sqlalchemy.orm.query.Query.exists Hope that helps, Simon On Fri, Aug 14, 2020 at 3:56 PM Prerna Pandit wrote: > > Hey Simon, > > Thanks so much for replying to my question. I reworked my code to use > sqlalchemy ORM and took off flask and paginate so I can narrow down the > issue. My models now extend from declarative_base. > > engine = > create_engine('postgresql://postgres:postgres@localhost:5400/postgres') > Session = sessionmaker(bind=engine) > > Base = declarative_base() > session = Session() > > Models > > class Aggregate(Base): > __tablename__ = 'aggregates' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > site_id = Column(UUID(as_uuid=True), nullable=True) > created_at = Column(DateTime, default=sa.func.now()) > created_by = Column(UUID(as_uuid=True), nullable=True) > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > updated_by = Column(UUID(as_uuid=True), nullable=True) > blocks = relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, > back_populates='aggregate') > > > class Block(Base): > > __tablename__ = 'blocks' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > type = Column(Text, nullable=False) > heading = Column(Text, nullable=True) > subheading = Column(Text, nullable=True) > label = Column(Text, nullable=True) > is_complete = Column(Boolean, default=False) > created_at = Column(DateTime, default=sa.func.now()) > created_by = Column(UUID(as_uuid=True), nullable=True) > updated_at = Column(DateTime, default=sa.func.now(), > onupdate=sa.func.now()) > updated_by = Column(UUID(as_uuid=True), nullable=True) > aggregates = relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, back_populates='block') > > > class AggregateBlock(Base): > __tablename__ = 'aggregate_blocks' > id = Column(UUID(as_uuid=True), primary_key=True, >server_default=text('uuid_generate_v4()')) > block_id = Column(UUID(as_uuid=True), > ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > aggregate_id = Column(UUID(as_uuid=True), > ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > position = Column(Integer, nullable=False) > block = relationship('Block', back_populates='aggregates') > aggregate = relationship('Aggregate', back_populates='blocks') > > > Query: > > select = session.query(Aggregate).order_by(Aggregate.created_at) \ > .join(AggregateBlock) \ > .join(Block) \ > .filter(Block.is_complete == complete) \ > > all_results = select.all() > limit_results = select.limit(20).all() > > > I still get inconsistent results when I apply limit. Like select.all() will > return 47 rows but with limit it'll return anywhere between 11 to 15. If I > take the generated SQL query and run it directly in psql, I get the correct > count. > > SELECT aggregates.id AS aggregates_id, aggregates.site_id AS > aggregates_site_id, aggregates.created_at AS aggregates_created_at, > aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS > aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by > FROM aggregates JOIN aggregate_blocks ON aggregates.id = > aggregate_blocks.aggregate_id JOIN blocks ON blocks.id = > aggregate_blocks.block_id > WHERE blocks.is_complete = false ORDER BY aggregates.created_at > > LIMIT 20 > > > > > > On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King wrote
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
Hey Simon, Thanks so much for replying to my question. I reworked my code to use sqlalchemy ORM and took off flask and paginate so I can narrow down the issue. My models now extend from declarative_base. engine = create_engine('postgresql://postgres:postgres@localhost:5400/postgres') Session = sessionmaker(bind=engine) Base = declarative_base() session = Session() Models class Aggregate(Base): __tablename__ = 'aggregates' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) site_id = Column(UUID(as_uuid=True), nullable=True) created_at = Column(DateTime, default=sa.func.now()) created_by = Column(UUID(as_uuid=True), nullable=True) updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now()) updated_by = Column(UUID(as_uuid=True), nullable=True) blocks = relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates='aggregate') class Block(Base): __tablename__ = 'blocks' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) type = Column(Text, nullable=False) heading = Column(Text, nullable=True) subheading = Column(Text, nullable=True) label = Column(Text, nullable=True) is_complete = Column(Boolean, default=False) created_at = Column(DateTime, default=sa.func.now()) created_by = Column(UUID(as_uuid=True), nullable=True) updated_at = Column(DateTime, default=sa.func.now(), onupdate=sa.func.now()) updated_by = Column(UUID(as_uuid=True), nullable=True) aggregates = relationship('AggregateBlock', cascade='all, delete-orphan', passive_deletes=True, back_populates='block') class AggregateBlock(Base): __tablename__ = 'aggregate_blocks' id = Column(UUID(as_uuid=True), primary_key=True, server_default=text('uuid_generate_v4()')) block_id = Column(UUID(as_uuid=True), ForeignKey('blocks.id', ondelete='CASCADE'), nullable=False, index=True) aggregate_id = Column(UUID(as_uuid=True), ForeignKey('aggregates.id', ondelete='RESTRICT'), nullable=False) position = Column(Integer, nullable=False) block = relationship('Block', back_populates='aggregates') aggregate = relationship('Aggregate', back_populates='blocks') Query: select = session.query(Aggregate).order_by(Aggregate.created_at) \ .join(AggregateBlock) \ .join(Block) \ .filter(Block.is_complete == complete) \ all_results = select.all() limit_results = select.limit(20).all() I still get inconsistent results when I apply limit. Like select.all() will return 47 rows but with limit it'll return anywhere between 11 to 15. If I take the generated SQL query and run it directly in psql, I get the correct count. SELECT aggregates.id AS aggregates_id, aggregates.site_id AS aggregates_site_id, aggregates.created_at AS aggregates_created_at, aggregates.created_by AS aggregates_created_by, aggregates.updated_at AS aggregates_updated_at, aggregates.updated_by AS aggregates_updated_by FROM aggregates JOIN aggregate_blocks ON aggregates.id = aggregate_blocks.aggregate_id JOIN blocks ON blocks.id = aggregate_blocks.block_id WHERE blocks.is_complete = false ORDER BY aggregates.created_at LIMIT 20 On Friday, August 14, 2020 at 10:08:23 AM UTC-4, Simon King wrote: > > "paginate" is not an SQLAlchemy function, so you'd be better off > asking the author of whatever is providing that feature. > > However, I would guess that maybe paginate is naively applying > something like "LIMIT 20" to the query. This doesn't work properly > when you join along a one-to-many relationship, because if you have > (for example) 2 "parent" objects, each with 5 "child" objects, the > query will return 10 rows, but SQLAlchemy de-duplicates the results to > return just the 2 parent objects. > > Simon > > On Thu, Aug 13, 2020 at 3:31 PM Prerna Pandit > wrote: > > > > Hello, I've been struggling with this issue for the past couple of days > and would really, truly appreciate if someone could please give me > pointers or direction as to what I might be missing. > > > > > > Here are my models; > > class Aggregate(db.Model): > > id = db.Column(UUID(as_uuid=True), primary_key=True, > >server_default=db.text('uuid_generate_v4()')) > > blocks = db.relationship('AggregateBlock', cascade='all, > delete-orphan', > > passive_deletes=True, > back_populates='aggregate').. > > > > > > class AggregateBlock(db.Model): > > > > id = db.Column(UUID(as_uuid=True), primary_key=True, > >server_default=db.text('uuid_generate_v4()')) > > block_id = db.Column(UUID(as_uuid=True), > >
Re: [sqlalchemy] Flask SQlAlchemy BaseQuery Paginate not working correctly
"paginate" is not an SQLAlchemy function, so you'd be better off asking the author of whatever is providing that feature. However, I would guess that maybe paginate is naively applying something like "LIMIT 20" to the query. This doesn't work properly when you join along a one-to-many relationship, because if you have (for example) 2 "parent" objects, each with 5 "child" objects, the query will return 10 rows, but SQLAlchemy de-duplicates the results to return just the 2 parent objects. Simon On Thu, Aug 13, 2020 at 3:31 PM Prerna Pandit wrote: > > Hello, I've been struggling with this issue for the past couple of days and > would really, truly appreciate if someone could please give me pointers or > direction as to what I might be missing. > > > Here are my models; > class Aggregate(db.Model): > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > blocks = db.relationship('AggregateBlock', cascade='all, delete-orphan', > passive_deletes=True, > back_populates='aggregate').. > > > class AggregateBlock(db.Model): > > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > block_id = db.Column(UUID(as_uuid=True), > db.ForeignKey('blocks.id', ondelete='CASCADE'), > nullable=False, index=True) > aggregate_id = db.Column(UUID(as_uuid=True), > db.ForeignKey('aggregates.id', > ondelete='RESTRICT'), nullable=False) > block = db.relationship('Block', back_populates='aggregates') > aggregate = db.relationship('Aggregate', back_populates='blocks') > > > > > class Block(db.Model): > id = db.Column(UUID(as_uuid=True), primary_key=True, >server_default=db.text('uuid_generate_v4()')) > is_complete = db.Column(db.Boolean, default=False) > aggregates = db.relationship('AggregateBlock', cascade='all, > delete-orphan', > passive_deletes=True, back_populates='block') > > > from flask_sqlalchemy import SQLAlchemy > db = SQLAlchemy() > > select = > db.session.query(Aggregate).join(AggregateBlock).join(Block).filter(Block.is_complete > == complete) > > print(len(select.all()) > > print(len(select.paginate(per_page=20).items()) > > > If I do a select.all(), I get the right number of rows which is 47. However, > if I try to paginate for a per_page size say 20, I lot a less rows like 11. > select.paginate(per_page=20). > The number could go up to 21 or so as I increase the page size. Why would > paginate decrease the number of returned records? > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/e5a14305-4e63-4467-9610-1faf3f8c8412o%40googlegroups.com. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CAFHwexeg0K1URZZbi4oaQHtLFkCVbE3Fn2%2BY%3DDaaZymUp9S-qg%40mail.gmail.com.